-1
我訓練創建推薦系統。我從網站獲取數據http://grouplens.org/datasets/movielens/指數5688超出範圍爲0軸的大小爲3706
import numpy as np
import pandas as pd
header = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv('ml-1m/ratings.dat', sep='::', names=header)
n_users = df.user_id.unique().shape[0]
n_items = df.item_id.unique().shape[0]
print ('Number of users = ' + str(n_users) + ' | Number of movies = ' + str(n_items))
用戶數= 6040 |電影的數量= 3706
from sklearn import cross_validation as cv
train_data, test_data = cv.train_test_split(df, test_size=0.25)
,我嘗試建立兩個用戶 - 項目矩陣,一個用於訓練,而另一個用於測試
train_data_matrix = np.zeros((n_users, n_items))
for line in train_data.itertuples():
train_data_matrix[line[1]-1, line[2]-1] = line[3]
test_data_matrix = np.zeros((n_users, n_items))
for line in test_data.itertuples():
test_data_matrix[line[1]-1, line[2]-1] = line[3]
,我得到(全回溯)
IndexError Traceback (most recent call last)
<ipython-input-39-180dea01cdf8> in <module>()
2 train_data_matrix = np.zeros((n_users, n_items))
3 for line in train_data.itertuples():
----> 4 train_data_matrix[line[1]-1, line[2]-1] = line[3]
5
6 test_data_matrix = np.zeros((n_users, n_items))
IndexError: index 5688 is out of bounds for axis 0 with size 3706
有什麼不對?
P.S.
train_data.head()
user_id item_id rating timestamp
483019 2968 2268 5 971107926
943582 5689 3615 3 963719230
116153 752 1147 5 975458000
103250 686 1704 5 975601762
235333 1425 3752 4 1023560349
PSS
for line in train_data.itertuples():
print (line)
Pandas(Index=483019, user_id=2968, item_id=2268, rating=5, timestamp=971107926)
Pandas(Index=943582, user_id=5689, item_id=3615, rating=3, timestamp=963719230)
Pandas(Index=116153, user_id=752, item_id=1147, rating=5, timestamp=975458000)
Pandas(Index=103250, user_id=686, item_id=1704, rating=5, timestamp=975601762)
train_data_matrix - 唯一值用戶與電影的id的矩陣。 5689 - 這是用戶的ID train_data.head() – Edward
我回答了我的問題 – Edward
但矩陣的行由行數,而不是用戶ID索引。 – hpaulj