我做了下面的代碼,但我想改進它。我不想重新讀取文件,但是如果我刪除sales_input.seek(0),它不會迭代拋出銷售中的每一行。我怎樣才能改善這一點?重新讀取python中的csv文件,而無需再次加載它
def computeCritics(mode, cleaned_sales_input = "data/cleaned_sales.csv"):
if mode == 1:
print "creating customer.critics.recommendations"
critics_output = open("data/customer/customer.critics.recommendations",
"wb")
ID = getCustomerSet(cleaned_sales_input)
sales_dict = pickle.load(open("data/customer/books.dict.recommendations",
"r"))
else:
print "creating books.critics.recommendations"
critics_output = open("data/books/books.critics.recommendations",
"wb")
ID = getBookSet(cleaned_sales_input)
sales_dict = pickle.load(open("data/books/users.dict.recommendations",
"r"))
critics = {}
# make critics dict and pickle it
for i in ID:
with open(cleaned_sales_input, 'rb') as sales_input:
sales = csv.reader(sales_input) # read new
for j in sales:
if mode == 1:
if int(i) == int(j[2]):
sales_dict[int(j[6])] = 1
else:
if int(i) == int(j[6]):
sales_dict[int(j[2])] = 1
critics[int(i)] = sales_dict
pickle.dump(critics, critics_output)
print "done"
cleaned_sales_input看起來像
6042772,2723,3546414,9782072488887,1,9.99,314968
6042769,2723,3546414,9782072488887,1,9.99,314968
...
,其中6號是書和號碼0是客戶ID
我希望得到一個字典至極的樣子
critics = {
CustomerID1: {
BookID1: 1,
BookID2: 0,
........
BookIDX: 0
},
CustomerID2: {
BookID1: 0,
BookID2: 1,
...
}
}
或
critics = {
BookID1: {
CustomerID1: 1,
CustomerID2: 0,
........
CustomerIDX: 0
},
BookID1: {
CustomerID1: 0,
CustomerID2: 1,
...
CustomerIDX: 0
}
}
我希望這不是多少信息
你是否對此進行了配置文件以查看csv閱讀是否是瓶頸? – RickyA
抱歉,這是什麼配置文件?我從來沒有聽說過。 –
[profiler](http://docs.python.org/2/library/profile.html)用於查看代碼的每個部分花費多少時間。您可以這樣做來識別代碼中的瓶頸。在配置文件之前優化事物幾乎是無用的,因爲你不知道瓶頸是什麼。所以也許你的文件讀取不是這裏的瓶頸。 – RickyA