我想跟隨在Python中Wikipedia Article on latent semantic indexing使用下面的代碼:潛在語義分析在Python差異
documentTermMatrix = array([[ 0., 1., 0., 1., 1., 0., 1.],
[ 0., 1., 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1., 1.],
[ 0., 0., 0., 1., 0., 0., 0.],
[ 0., 1., 1., 0., 0., 0., 0.],
[ 1., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 1., 1., 0.],
[ 0., 0., 1., 1., 0., 0., 0.],
[ 1., 0., 0., 1., 0., 0., 0.]])
u,s,vt = linalg.svd(documentTermMatrix, full_matrices=False)
sigma = diag(s)
## remove extra dimensions...
numberOfDimensions = 4
for i in range(4, len(sigma) -1):
sigma[i][i] = 0
queryVector = array([[ 0.], # same as first column in documentTermMatrix
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 1.],
[ 0.],
[ 0.],
[ 1.]])
如何數學說,它應該工作:
dtMatrixToQueryAgainst = dot(u, dot(s,vt))
queryVector = dot(inv(s), dot(transpose(u), queryVector))
similarityToFirst = cosineDistance(queryVector, dtMatrixToQueryAgainst[:,0]
# gives 'matrices are not aligned' error. should be 1 because they're the same
什麼工作,與數學看起來不正確:(從here)
dtMatrixToQueryAgainst = dot(s, vt)
queryVector = dot(transpose(u), queryVector)
similarityToFirst = cosineDistance(queryVector, dtMatrixToQueryAgainsst[:,0])
# gives 1, which is correct
爲什麼rou第一個沒有,當我能找到關於LSA數學的一切時,第一個是正確的?我覺得我錯過了一些顯而易見的東西...
'##刪除多餘維度......'涉及什麼? – Avaris 2012-04-25 23:59:04
編輯顯示排名減少 – Jmjmh 2012-04-26 21:06:04
at'u,s,vt = linalg.svd(a,full_matrices = False)',其中''''來自??? – Oerd 2012-05-01 16:50:41