使用@Anton Protopopov樣本文件。在單獨的操作中讀取文件和標題的部分位數比讀取整個文件要便宜得多。
就直接讀取最終行
In [22]: df = read_csv("file.csv", nrows=10000, skiprows=990001, header=None, index_col=0)
In [23]: df
Out[23]:
1 2 3
0
990000 -0.902507 -0.274718 1.155361
990001 -0.591442 -0.318853 -0.089092
990002 -1.461444 -0.070372 0.946964
990003 0.608169 -0.076891 0.431654
990004 1.149982 0.661430 0.456155
... ... ... ...
999995 0.057719 0.370591 0.081722
999996 0.157751 -1.204664 1.150288
999997 -2.174867 -0.578116 0.647010
999998 -0.668920 1.059817 -2.091019
999999 -0.263830 -1.195737 -0.571498
[10000 rows x 3 columns]
相當快做到這一點
In [24]: %timeit read_csv("file.csv", nrows=10000, skiprows=990001, header=None, index_col=0)
1 loop, best of 3: 262 ms per loop
相當便宜,以確定該文件的先驗
In [25]: %timeit sum(1 for l in open('file.csv'))
10 loops, best of 3: 104 ms per loop
閱讀中的長標題
In [26]: df.columns = read_csv('file.csv', header=0, nrows=1, index_col=0).columns
In [27]: df
Out[27]:
a b c
0
990000 -0.902507 -0.274718 1.155361
990001 -0.591442 -0.318853 -0.089092
990002 -1.461444 -0.070372 0.946964
990003 0.608169 -0.076891 0.431654
990004 1.149982 0.661430 0.456155
... ... ... ...
999995 0.057719 0.370591 0.081722
999996 0.157751 -1.204664 1.150288
999997 -2.174867 -0.578116 0.647010
999998 -0.668920 1.059817 -2.091019
999999 -0.263830 -1.195737 -0.571498
[10000 rows x 3 columns]
你在Linux或OSX系統上嗎?如果是這樣,那麼使用'tail -n 10000 file> file2'可能是最簡單的... – Carpetsmoker
打擊@Carpetsmoker的想法,如果你堅持使用'Python',你可以在'subprocess.call )':P – Mai
@Carpetsmoker,但他也需要一個標題。它應該是'head -n 1 file> file2; tail -n 10000 file >> file2' –