2015-12-08 166 views
0

我有蟒蛇下面的代碼Python的矩陣分解

############################################################################### 

""" 
@INPUT: 
    R  : a matrix to be factorized, dimension N x M 
    P  : an initial matrix of dimension N x K 
    Q  : an initial matrix of dimension M x K 
    K  : the number of latent features 
    steps : the maximum number of steps to perform the optimisation 
    alpha : the learning rate 
    beta : the regularization parameter 
@OUTPUT: 
    the final matrices P and Q 
""" 
def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02): 
    Q = Q.T 
    for step in xrange(steps): 
     for i in xrange(len(R)): 
      for j in xrange(len(R[i])): 
       if R[i][j] > 0: 
        eij = R[i][j] - numpy.dot(P[i,:],Q[:,j]) 
        for k in xrange(K): 
         P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k]) 
         Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j]) 
     eR = numpy.dot(P,Q) 
     e = 0 
     for i in xrange(len(R)): 
      for j in xrange(len(R[i])): 
       if R[i][j] > 0: 
        e = e + pow(R[i][j] - numpy.dot(P[i,:],Q[:,j]), 2) 
        for k in xrange(K): 
         e = e + (beta/2) * (pow(P[i][k],2) + pow(Q[k][j],2)) 
     if e < 0.001: 
      break 
    return P, Q.T 

############################################################################### 

這個代碼鏈接如下: http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/

代碼工作正常爲小矩陣,但我有兩個大型矩陣P(15715 ,203)和Q(203,16384),當我嘗試P上執行該代碼和Q它給了我下面的錯誤

K=203 

matrix_factorization(R, P, Q, K) 
--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
<ipython-input-3-00b8211f2507> in <module>() 
----> 1 matrix_factorization(R, P, Q, K) 

/Users/ajinkyachandrakantbobade/Desktop/random_choicefile/trial.py in matrix_factorization(R, P, Q, K, steps, alpha, beta) 
    52    for j in xrange(len(R[i])): 
    53     if R[i][j] > 0: 
---> 54      eij = R[i][j] - numpy.dot(P[i,:],Q[:,j]) 
    55      for k in xrange(K): 
    56       P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k]) 

/Users/ajinkyachandrakantbobade/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key) 
    1967    return self._getitem_multilevel(key) 
    1968   else: 
-> 1969    return self._getitem_column(key) 
    1970 
    1971  def _getitem_column(self, key): 

/Users/ajinkyachandrakantbobade/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key) 
    1974   # get column 
    1975   if self.columns.is_unique: 
-> 1976    return self._get_item_cache(key) 
    1977 
    1978   # duplicate columns & possible reduce dimensionality 

/Users/ajinkyachandrakantbobade/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item) 
    1087   """ return the cached item, item represents a label indexer """ 
    1088   cache = self._item_cache 
-> 1089   res = cache.get(item) 
    1090   if res is None: 
    1091    values = self._data.get(item) 

TypeError: unhashable type 

燦任何人請幫助關於這個錯誤?

回答

0

您試圖乘法的矩陣大小過大,並且您沒有足夠的內存來完成計算。有幾件事情,可以幫助:

  • 得到更多的內存
  • 如果你的基體中含有大量的0,你可以嘗試使用稀疏矩陣。它們是常規矩陣,只存儲表示值不等於零的元素。 The documentation of scipy會給你一些關於這方面的信息。
  • 似乎你已經在使用Python的64位,但如果沒有這樣的工作比Python的32位
基於目錄名(即Canopy_64)
+0

好,OP是使用64位的蟒蛇。 –

+0

好吧,我會編輯答案 –

+0

謝謝你的評論。結果矩陣中有很多零我讀過文檔,但找不到如何創建這樣一個巨大矩陣的稀疏矩陣的方法是否有任何其他文檔,我應該參考一下將解決的問題? –