我想讀取一個有1000行的csv文件,所以我決定以塊讀取這個文件。但是我在閱讀這個csv文件時遇到了問題。如何使用pandas每次從csv文件讀取10條記錄?
我想在第1次迭代時讀取前10條記錄,並在第2次迭代時將其特定列轉換爲python字典跳過前10條記錄並讀取下面的10條記錄。
Input.csv-
time,line_id,high,low,avg,total,split_counts
1468332421098000,206,50879,50879,50879,2,"[50000,2]"
1468332421195000,206,39556,39556,39556,2,"[30000,2]"
1468332421383000,206,61636,61636,61636,2,"[60000,2]"
1468332423568000,206,47315,38931,43123,4,"[30000,2][40000,2]"
1468332423489000,206,38514,38445,38475,6,"[30000,6]"
1468332421672000,206,60079,60079,60079,2,"[60000,2]"
1468332421818000,206,44664,44664,44664,2,"[40000,2]"
1468332422164000,206,48500,48500,48500,2,"[40000,2]"
1468332423490000,206,39469,37894,38206,12,"[30000,12]"
1468332422538000,206,44023,44023,44023,2,"[40000,2]"
1468332423491000,206,38813,38813,38813,2,"[30000,2]"
1468332423528000,206,75970,75970,75970,2,"[70000,2]"
1468332423533000,206,42546,42470,42508,4,"[40000,4]"
1468332423536000,206,41065,40888,40976,4,"[40000,4]"
1468332423566000,206,66401,62453,64549,6,"[60000,6]"
程序代碼 -
if __name__ == '__main__':
s = 0
while(True):
n = 10
df = pandas.read_csv('Input.csv', skiprows=s, nrows=n)
d = dict(zip(df.time, df.split_counts))
print d
s += n
我面對的反應這個問題
AttributeError: 'DataFrame' object has no attribute 'time'
我知道在第二次迭代它無法確定時間和split_counts屬性但是有什麼辦法做我想要的?
您還可以使用read_csv的chunksize參數。這意味着這是O(n)而不是O(n^2),因爲你只能讀一次文件。 –