Hi!
First of all, thanks for providing this nice work!
While I am looking into the code, I found the squared_distance function is a little bit confusing. If Y is not provided (so Y = X), this function will do an option of X - X and then take the sum. So, isn't the return value zero?
|
def squared_distance(X, Y=None, W=None): |
|
''' |
|
Calculates the pairwise distance between points in X and Y |
|
|
|
X: n x d matrix |
|
Y: m x d matrix |
|
W: affinity -- if provided, we normalize the distance |
|
|
|
returns: n x m matrix of all pairwise squared Euclidean distances |
|
''' |
|
if Y is None: |
|
Y = X |
|
# distance = squaredDistance(X, Y) |
|
sum_dimensions = list(range(2, K.ndim(X) + 1)) |
|
X = K.expand_dims(X, axis=1) |
|
if W is not None: |
|
# if W provided, we normalize X and Y by W |
|
D_diag = K.expand_dims(K.sqrt(K.sum(W, axis=1)), axis=1) |
|
X /= D_diag |
|
Y /= D_diag |
|
squared_difference = K.square(X - Y) |
|
distance = K.sum(squared_difference, axis=sum_dimensions) |
|
return distance |
Another question about the number of clusters K, can I use a relatively larger number when my dataset contains about 1 million samples? For example, over 1000?
Thanks!
Fan
Hi!
First of all, thanks for providing this nice work!
While I am looking into the code, I found the squared_distance function is a little bit confusing. If Y is not provided (so Y = X), this function will do an option of X - X and then take the sum. So, isn't the return value zero?
SpectralNet/src/core/costs.py
Lines 11 to 33 in 43b0fca
Another question about the number of clusters K, can I use a relatively larger number when my dataset contains about 1 million samples? For example, over 1000?
Thanks!
Fan