在Python中首先讀取一個文件的N行

135

with open("datafile") as myfile: 
    head = [next(myfile) for x in xrange(N)] 
print head

這裏的另一種方式

from itertools import islice 
with open("datafile") as myfile: 
    head = list(islice(myfile, N)) 
print head

來源

2009-11-20 00:27:18

+1

謝謝，這確實很有幫助。兩者有什麼區別？（在性能，所需的庫，兼容性等方面）？ – Russell 2009-11-20 00:34:34

+1

我預計性能會相似，也許首先會稍微快一點。但是如果文件沒有至少N行，第一個將不起作用。您最好根據您將要使用的一些典型數據來衡量性能。 – 2009-11-20 00:47:33

+1

with語句適用於Python 2.6，並且需要2.5上的額外import語句。對於2.4或更早版本，您需要用try ... except塊重寫代碼。在風格上，我更喜歡第一個選項，雖然如上所述，第二個選項對於短文件更加健壯。 – Alasdair 2009-11-20 01:21:07

5

沒有讀通過的文件對象公開的行數具體方法。

我想最簡單的方法是以下幾點：

lines =[] 
with open(file_name) as f: 
    lines.extend(f.readline() for i in xrange(N))

來源

2009-11-20 00:27:39 artdanil

+0

這是我實際上想要的。雖然，我雖然增加了每一行列表。謝謝。 – artdanil 2009-11-20 02:11:18

2

如果你想要的東西，明明（頭也不擡的手冊深奧的東西）作品，未經進口和try/except和工作在一個合理範圍內的Python 2.x的版本（2.2至2.6）：基於gnibbler

def headn(file_name, n): 
    """Like *x head -N command""" 
    result = [] 
    nlines = 0 
    assert n >= 1 
    for line in open(file_name): 
     result.append(line) 
     nlines += 1 
     if nlines >= n: 
      break 
    return result 

if __name__ == "__main__": 
    import sys 
    rval = headn(sys.argv[1], int(sys.argv[2])) 
    print rval 
    print len(rval)

來源

2009-11-20 02:00:36

15

N=10 
f=open("file") 
for i in range(N): 
    line=f.next().strip() 
    print line 
f.close()

來源

2009-11-20 02:04:36 ghostdog74

+14

每當我看到'f = open（「file」）'時，我都會畏縮而沒有異常處理來關閉文件。 Pythonic處理文件的方式是使用上下文管理器，即使用with語句。這在[輸入輸出Python教程]（http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects）中有介紹。 ''在處理文件對象時使用with關鍵字是一個好習慣，這有一個好處，就是文件在套件結束後可以正常關閉，即使在出現異常時也是如此。「' – 2013-06-18 17:33:00

3

（在0:27 11月20日'09）頂級投回答：這個類添加頭（）和尾（）方法文件ob JECT。

class File(file): 
    def head(self, lines_2find=1): 
     self.seek(0)       #Rewind file 
     return [self.next() for x in xrange(lines_2find)] 

    def tail(self, lines_2find=1): 
     self.seek(0, 2)       #go to end of file 
     bytes_in_file = self.tell()    
     lines_found, total_bytes_scanned = 0, 0 
     while (lines_2find+1 > lines_found and 
       bytes_in_file > total_bytes_scanned): 
      byte_block = min(1024, bytes_in_file-total_bytes_scanned) 
      self.seek(-(byte_block+total_bytes_scanned), 2) 
      total_bytes_scanned += byte_block 
      lines_found += self.read(1024).count('\n') 
     self.seek(-total_bytes_scanned, 2) 
     line_list = list(self.readlines()) 
     return line_list[-lines_2find:]

用法：

f = File('path/to/file', 'r') 
f.head(3) 
f.tail(3)

來源

2011-01-20 19:42:58 fdb

2

我自己最方便易方式：

LINE_COUNT = 3 
print [s for (i, s) in enumerate(open('test.txt')) if i < LINE_COUNT]

解決方案基於List Comprehension 的open（）函數支持迭代接口。枚舉（）覆蓋open（）並返回元組（index，item），然後檢查我們是否在可接受的範圍內（如果我是<LINE_COUNT），然後只是打印結果。

享受Python。 ;）

來源

2011-12-07 08:26:17

3

從Python 2.6開始，您可以利用IO基類中更復雜的函數。

來源

2012-12-06 18:02:26

+23

根據[docs ]（http://docs.python.org/2/library/stdtypes.html#file.readlines）N是要讀取的_bytes_的數量，而不是** _lines_的數量。 – 2013-06-18 17:41:42

+2

N是字節數！ – qed 2014-06-01 14:19:02

+4

哇。談論可憐的命名。函數名稱提到了'lines'，但參數指向'bytes'。 – ArtOfWarfare 2015-04-27 18:22:45

7

如果你想

with open("datafile") as myfile: 
     head = myfile.readlines(N) 
    print head

（你不必擔心有小於N行，因爲沒有StopIteration異常被拋出您的文件。）：所以上面的最精彩的答案可以改寫成快速讀取第一行，並且不關心可以使用.readlines()的性能，它將返回列表對象，然後對列表進行分片。

E.g.前5行：

with open("pathofmyfileandfileandname") as myfile: 
    firstNlines=myfile.readlines()[0:5] #put here the interval you want

注：整個文件被讀取，以便爲不是從視圖性能上來說最好的，但它很容易使用，快速編寫和易於要記住，所以如果你只想執行一些一次性的計算非常方便

print firstNlines

來源

2013-12-07 12:59:02

+2

最好的答案可能是更有效的方式，但這個作品像小文件的魅力一樣。 – 2015-11-07 12:53:29

+1

請注意，這實際上是將整個文件讀入列表中（myfile.readlines（）），然後拼接它的前5行。 – AbdealiJK 2016-10-25 09:07:36

0

如果你有一個非常大的文件，和地設想明你想輸出是一個numpy數組，使用np.genfromtxt會凍結你的計算機。這是如此在我的經驗要好得多：

def load_big_file(fname,maxrows): 
'''only works for well-formed text file of space-separated doubles''' 

rows = [] # unknown number of lines, so use list 

with open(fname) as f: 
    j=0   
    for line in f: 
     if j==maxrows: 
      break 
     else: 
      line = [float(s) for s in line.split()] 
      rows.append(np.array(line, dtype = np.double)) 
      j+=1 
return np.vstack(rows) # convert list of vectors to array

來源

2014-11-25 05:00:10 cacosomoza

2

對於前5行，簡單地做：

N=5 
with open("data_file", "r") as file: 
    for i in range(N): 
     print file.next()

來源

2016-10-28 02:36:25 Surya

3

我要做的就是打電話給使用pandas的N行。我覺得性能不是最好的，但例如，如果N=1000：

import pandas as pd 
yourfile = pd.read('path/to/your/file.csv',nrows=1000)

來源

2017-04-11 14:54:59

+1

更好的辦法是使用'nrows'選項，它可以設置爲1000，整個文件不會被加載。 http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html一般來說，熊貓有這個和其他節省大量文件的技術。 – philshem 2017-04-11 15:03:40

+0

是的，你是對的。我只是糾正它。對不起，這個錯誤。 – 2017-04-11 15:06:52

+1

您可能還需要添加'sep'來定義列分隔符（不應出現在非csv文件中） – philshem 2017-04-11 15:09:11

0

#!/usr/bin/python 

import subprocess 

p = subprocess.Popen(["tail", "-n 3", "passlist"], stdout=subprocess.PIPE) 

output, err = p.communicate() 

print output

此方法有效，我

來源

2017-07-12 16:25:03

0

兩個這樣做的最直觀的方式將是：

對文件逐行進行迭代，並在N之後對break行進行迭代。
使用next()方法N次，逐行對文件進行迭代。（這基本上是隻爲最多的回答做什麼不同的語法。）

下面是代碼：

# Method 1: 
with open("fileName", "r") as f: 
    counter = 0 
    for line in f: 
     print line 
     counter += 1 
     if counter == N: break 

# Method 2: 
with open("fileName", "r") as f: 
    for i in xrange(N): 
     line = f.next() 
     print line

的底線是，只要你不使用readlines()或將整個文件存入內存中，您有很多選擇。

來源

2018-03-02 23:42:23 FatihAkici

在Python中首先讀取一個文件的N行

回答

相關問題