2011-03-27 86 views
2

我翻譯以下SVD recommendation system,用Ruby編寫的,分配問題數學:數學SVD推薦系統,內環路

require 'linalg' 

users = { 1 => "Ben", 2 => "Tom", 3 => "John", 4 => "Fred" } 
m = Linalg::DMatrix[ 
      #Ben, Tom, John, Fred 
      [5,5,0,5], # season 1 
      [5,0,3,4], # season 2 
      [3,4,0,3], # season 3 
      [0,0,5,3], # season 4 
      [5,4,4,5], # season 5 
      [5,4,5,5] # season 6 
      ] 

# Compute the SVD Decomposition 
u, s, vt = m.singular_value_decomposition 
vt = vt.transpose 

# Take the 2-rank approximation of the Matrix 
# - Take first and second columns of u (6x2) 
# - Take first and second columns of vt (4x2) 
# - Take the first two eigen-values (2x2) 
u2 = Linalg::DMatrix.join_columns [u.column(0), u.column(1)] 
v2 = Linalg::DMatrix.join_columns [vt.column(0), vt.column(1)] 
eig2 = Linalg::DMatrix.columns [s.column(0).to_a.flatten[0,2], s.column(1).to_a.flatten[0,2]] 

# Here comes Bob, our new user 
bob = Linalg::DMatrix[[5,5,0,0,0,5]] 
bobEmbed = bob * u2 * eig2.inverse 

# Compute the cosine similarity between Bob and every other User in our 2-D space 
user_sim, count = {}, 1 
v2.rows.each { |x| 
    user_sim[count] = (bobEmbed.transpose.dot(x.transpose))/(x.norm * bobEmbed.norm) 
    count += 1 
    } 

# Remove all users who fall below the 0.90 cosine similarity cutoff and sort by similarity 
similar_users = user_sim.delete_if {|k,sim| sim < 0.9 }.sort {|a,b| b[1] <=> a[1] } 
similar_users.each { |u| printf "%s (ID: %d, Similarity: %0.3f) \\n", users[u[0]], u[0], u[1] } 

# We'll use a simple strategy in this case: 
# 1) Select the most similar user 
# 2) Compare all items rated by this user against your own and select items that you have not yet rated 
# 3) Return the ratings for items I have not yet seen, but the most similar user has rated 
similarUsersItems = m.column(similar_users[0][0]-1).transpose.to_a.flatten 
myItems = bob.transpose.to_a.flatten 

not_seen_yet = {} 
myItems.each_index { |i| 
    not_seen_yet[i+1] = similarUsersItems[i] if myItems[i] == 0 and similarUsersItems[i] != 0 
} 

printf "\\n %s recommends: \\n", users[similar_users[0][0]] 
not_seen_yet.sort {|a,b| b[1] <=> a[1] }.each { |item| 
    printf "\\tSeason %d .. I gave it a rating of %d \\n", item[0], item[1] 
} 

print "We've seen all the same seasons, bugger!" if not_seen_yet.size == 0 

以下是相應的Mathematica代碼:

Clear[s, u, v, s2, u2, v2, m, n, testdata, trainingdata, user, user2d]; 
find1nn[trainingdata_, user_] := { 
    {u , s, v} = SingularValueDecomposition[Transpose[trainingdata]]; 
    (* Reducr to 2 dimensions. *) 
    u2 = u[[All, {1, 2}]]; 
    s2 = s[[{1, 2}, {1, 2}]]; 
    v2 = v[[All, {1, 2}]]; 
    user2d = user.u2.Inverse[s2]; 
    {m, n} = Dimensions[v2]; 
    closest = -1; 
    index = -1; 
    For[a = 1, a < m, a++, 
    {distance = 1 - CosineDistance[v2[[a, {1, 2}]], user2d];, 
     If[distance > closest, {closest = distance, index = a}];}]; 
    closestuserratings = trainingdata[[index]]; 
    closestuserratings 
    } 
rec[closest_, userx_] := { 
    d = Dimensions[closest]; 
    For[b = 1, b <= d[[2]], b++, 
    If[userx[[b]] == 0., userx[[b]] = closest[[1, b]]] 
    ] 
    userx 
    } 
finalrec[td_, user_] := rec[find1nn[td, user], user] 
(*Clear[s,u,v,s2,u2,v2,m,n,testdata,trainingdata,user,user2d]*) 
testdata = {{5., 5., 3., 0., 5., 5.}, {5., 0., 4., 1., 4., 4.}, {0., 
    3., 0., 5., 4., 5.}, {5., 4., 3., 3., 5., 5.}}; 
bob = {5., 0., 4., 0., 4., 5.}; 
(*recommend[testdata,bob]*) 
find1nn[testdata, bob] 
finalrec[testdata, bob] 

對於某些原因,它沒有分配用戶內部的索引,但在外面。什麼可能導致這種情況發生?

+0

您可以編輯你的問題,只是張貼原來的片段?如果該鏈接斷開,則此問題的_entire_上下文將丟失。 – 2011-03-27 12:40:48

+0

我厭倦了我自己的翻譯,但是我得到的http://farm1.static.flickr.com/133/358494623_db22603640_o.png的標誌不匹配。什麼會造成這種情況? – 2011-03-27 23:31:12

+0

@Mr。奇數的錯誤,加起來來自這兩個部分。 – 2011-03-28 04:41:12

回答

3

請查閱數學文檔中變量的本地化教程。問題在於你的rec功能。問題在於你無法正常修改Mathematica中的輸入變量(如果你的函數有一個保持屬性,那麼你可以做到這一點,以便將所討論的參數傳遞給它未評估,但事實並非如此這裏):

rec[closest_, userxi_] := 
Block[{d, b, userx = userxi}, {d = Dimensions[closest]; 
    For[b = 1, b <= d[[2]], b++, 
    If[userx[[b]] == 0., userx[[b]] = closest[[1, b]]]]; 
    userx} 
1

沒有試圖瞭解你想要達到的,在這裏你有一個更Mathematca十歲上下,但相當於(我希望)工作的代碼。

顯式循環都沒有了,而且很多不必要的瓦爾消除。所有變量現在都是本地的,所以不需要使用Clear []。

find1nn[trainingdata_, user_] := 
    Module[{u, s, v, v2, user2d, m, distances}, 
    {u, s, v} = SingularValueDecomposition[Transpose[trainingdata]]; 
    v2 = v[[All, {1, 2}]]; 
    user2d = user.u[[All, {1, 2}]].Inverse[s[[{1, 2}, {1, 2}]]]; 
    m = [email protected][v2]; 
    distances = (1 - CosineDistance[v2[[#, {1, 2}]], user2d]) & /@ Range[m - 1]; 
    {trainingdata[[Ordering[distances][[-1]]]]}]; 

rec[closest_, userxi_] := userxi[[#]] /. {0. -> closest[[1, #]]} & /@ 
          Range[Dimensions[closest][[2]]]; 

finalrec[td_, user_] := rec[find1nn[td, user], user]; 

我相信它仍然可以優化了不少。

+1

是的,位置[距離,#] [[1,1]]和@Max [距離]可以寫成Ordering [距離] [[ - 1]],它更加簡潔和快30%。 – 2011-03-27 20:12:15

+0

@Sjoerd完成。用它編輯(還有一些)mods。謝謝! – 2011-03-27 20:22:20

+0

您遺留了一段舊代碼揮之不去 – 2011-03-27 20:26:20

1

這是我在這個鏡頭基於貝利薩留的代碼,並與Sjoerd的改進。

find1nn[trainingdata_, user_] := 
    Module[{u, s, v, user2d, distances}, 
    {u, s, v} = SingularValueDecomposition[trainingdata\[Transpose], 2]; 
    user2d = user . u . [email protected]; 
    distances = # ~CosineDistance~ user2d & /@ [email protected]; 
    trainingdata[[ distances ~Ordering~ 1 ]] 
    ] 

rec[closest_, userxi_] := If[# == 0, #2, #] & ~MapThread~ {userxi, closest[[1]]} 
+0

+1訂購[距離,-1]而不是訂購[距離] [[-1]]。這仍然更快。 – 2011-03-27 22:06:19

+0

嚮導我們不能只放棄1 - CosineDistance的1-部分並測試_smallest_值(Ordering [距離,1])而不是最大值? – 2011-03-27 22:09:17

+0

@Sjoerd,我不知道,我得考慮一下。但我還有一個擔憂:'Ordering'只會返回一個位置,但'Position'可能會返回多個結果。由於我沒有理解這是怎麼回事,我不知道這是否是一個問題。我現在需要查看原始代碼。 – 2011-03-27 22:39:33

0
Clear[s, u, v, s2, u2, v2, m, n, testdata, trainingdata, user, user2d]; 
recommend[trainingdata_, user_] := { 
    {u , s, v} = SingularValueDecomposition[Transpose[trainingdata]]; 
    (* Reducera till 2 dimensioner. *) 
    u2 = u[[All, {1, 2}]]; 
    s2 = s[[{1, 2}, {1, 2}]]; 
    v2 = v[[All, {1, 2}]]; 
    user2d = user.u2.Inverse[s2]; 
    {m, n} = Dimensions[v2]; 
    closest = -1; 
    index = -1; 
    For[a = 1, a < m, a++, 
    {distance = 1 - CosineDistance[v2[[a, {1, 2}]], user2d];, 
     If[distance > closest, {closest = distance, index = a}];}]; 
    closestuserratings = trainingdata[[index]]; 
    d = Dimensions[closestuserratings]; 
    updateduser = Table[0, {i, 1, d[[1]]}]; 
    For[b = 1, b <= d[[1]], b++, 
    If[user[[b]] == 0., updateduser[[b]] = closestuserratings[[b]], 
    updateduser[[b]] = user[[b]]] 
    ] 
    updateduser 
    } 
testdata = {{5., 5., 3., 0., 5., 5.}, {5., 0., 4., 1., 4., 4.}, {0., 
    3., 0., 5., 4., 5.}, {5., 4., 3., 3., 5., 5.}}; 
bob = {5., 0., 4., 0., 4., 5.}; 
recommend[testdata, bob] 

{{5。空,空爲0,4空,1空,空4,5,空}}

現在的作品,但爲什麼空值?

+0

這是因爲你錯過了一個;在For循環之後。沒有冒犯的意思,但你看過其他的貢獻嗎?這個問題已經得到解答,導致代碼大大改進。 – 2011-03-29 14:12:30