2013-01-21 79 views
3

我不明白爲什麼numpy.genfromtxt不能正確拆分使用delimiter=","的下列字符串,而它適用於我的大部分其他字符串。numpy.genfromtxt:delimiter =','無法拆分字符串

chunk[12968] 
Out[143]: '2901869281,3279442095,2012-12-15T23:00:00.003Z,Sacramento,CA,R#3817874,United States,38.583,-121.498,11, 8, 6, 5, 1, 0, 2, 3, 3, 5, 3, 3, 2, 2, 6, 6, 1, 2, 3, 0, 1, 1, 0, 0, 2, 2, 2, 2, 1, 0, 0, 2, 1, 0, 1, 1, 2, 0, 3, 1, 1, 1, 1, 0, 0, 4, 0, 0, 0, 1, 3, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 0, 9, 0, 0, 0, 2, 3, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0,130\n' 

我所期望的形狀(110)的陣列,但得到的是我使用izip_longestitertools讀取由塊大* CSV這種方式下

genfromtxt([chunk[12968]],delimiter=",",dtype=np.int64) 
Out[142]: 
array([2901869281, 3279442095,   -1,   -1,   -1, 
       -1], dtype=int64) 

注:

with open('events.csv','r') as: 
    for chunk in izip_longest(*[f] *50000): 
      ... 

感謝您的幫助。

回答

7

comments參數genfromtxt()默認爲'#',所以一切都會過去在你輸入#是越來越忽略:

2901869281,3279442095,2012-12-15T23:00:00.003Z,Sacramento,CA,R#3817874,United States,... 
                  ^start of comment