python - 從特定文本行讀取文件

我不是在談論特定的行號，因爲我正在閱讀多個文件，格式相同但長度不同。
說我有這個文本文件：python - 從特定文本行讀取文件

Something here... 
... ... ... 
Start      #I want this block of text 
a b c d e f g 
h i j k l m n 
End      #until this line of the file 
something here... 
... ... ...

我希望你知道我的意思。我正在考慮迭代文件，然後使用正則表達式來查找「開始」和「結束」的行號，然後使用linecache從開始行讀取結束行。但是如何獲得行號？我可以使用什麼功能？

來源

2011-09-26 BPm

這個問題是非常相似，這一個http://stackoverflow.com/questions/7098530/repeatedly-extract-a-line-between-two-delimiters-in-a-text-file-python – salomonvh

如果你只是想開始和結束之間的文本塊，你可以做喜歡的事，很簡單：

with open('test.txt') as input_data: 
    # Skips text before the beginning of the interesting block: 
    for line in input_data: 
     if line.strip() == 'Start': # Or whatever test is needed 
      break 
    # Reads text until the end of the block: 
    for line in input_data: # This keeps reading the file 
     if line.strip() == 'End': 
      break 
     print line # Line is extracted (or block_of_lines.append(line), etc.)

其實，你不需要爲了讀取數據來操作行號開始和結束標記之間。

在兩個塊中重複邏輯（「直到...」），但它非常清晰和高效（其他方法通常涉及檢查某些狀態[在塊/塊內/結束塊達到之前]，這會導致時間處罰）。

來源

2011-09-26 18:29:28 EOL

這應該是一個開始給你：

started = False 
collected_lines = [] 
with open(path, "r") as fp: 
    for i, line in enumerate(fp.readlines()): 
     if line.rstrip() == "Start": 
      started = True 
      print "started at line", i # counts from zero ! 
      continue 
      if started and line.rstrip()=="End": 
      print "end at line", i 
      break 
      # process line 
      collected_lines.append(line.rstrip())

的enumerate發電機以一個生成器和枚舉迭代。例如，

print list(enumerate("a b c".split()))

打印

[ (0, "a"), (1,"b"), (2, "c") ]

UPDATE：

海報要求使用正則表達式匹配線，如「===」和「==」：

import re 
print re.match("^=+$", "===")  is not None 
print re.match("^=+$", "======") is not None 
print re.match("^=+$", "=")  is not None 
print re.match("^=+$", "=abc") is not None 
print re.match("^=+$", "abc=") is not None

來源

2011-09-26 18:22:51 rocksportrocker

這是一些可以工作的東西：

data_file = open("test.txt") 
block = "" 
found = False 

for line in data_file: 
    if found: 
     block += line 
     if line.strip() == "End": break 
    else: 
     if line.strip() == "Start": 
      found = True 
      block = "Start" 

data_file.close()

來源

2011-09-26 18:23:48 orlp

這是一個巧妙的技巧 – BPm

@BPm：這是一個「有限狀態機」（http://en.wikipedia.org/wiki/Finite_state_machine）的例子：機器啓動時處於「Block not yet found」狀態（找到== False），一直運行在「塊內」狀態（找到== True），在這種情況下，當找到「End」時停止運行。它們可能有點低效（這裏，必須檢查塊中的每一行都找到'found'），但狀態機通常允許用戶清晰地表達更復雜算法的邏輯。 – EOL

+1，因爲這是完全有效的狀態機方法的一個很好的例子。 – EOL

你可以很容易地使用正則表達式。你可以根據需要使它更健壯，下面是一個簡單的例子。

>>> import re 
>>> START = "some" 
>>> END = "Hello" 
>>> test = "this is some\nsample text\nthat has the\nwords Hello World\n" 
>>> m = re.compile(r'%s.*?%s' % (START,END),re.S) 
>>> m.search(test).group(0) 
'some\nsample text\nthat has the\nwords Hello'

來源

2011-09-26 20:23:02 pyInTheSky

+1：非常好的想法：這是緊湊的，並且可能非常有效，因爲're'模塊很快。儘管如此，在你的正則表達式中（'^ ... $'），START和END標籤應該被強制自己排成一行。 – EOL

謝謝:)）我不認爲你可以使用^ || $當你使用重新。S規範，因爲它包含換行符，認爲你需要明確地說'％s \ n。*？％s \ n' – pyInTheSky

在這種情況下，您肯定可以使用^ ... $，只需添加re.MULTILINE標誌（ http://docs.python.org/dev/library/re.html#module-contents）。 – EOL

python - 從特定文本行讀取文件

回答

相關問題