如何從文件中讀取一些'標記'？

例如，如果我有一個結構非常簡單，在這裏是它的幾個不同部位，用不同結構的一些文本/日誌文件，並通過一些標記線分裂，如：如何從文件中讀取一些'標記'？

0x23499 0x234234 0x234234 
... 
0x34534 0x353454 0x345464 
$$$NEW_SECTION$$$ 
4345-34534-345-345345-3453 
3453-34534-346-766788-3534 
...

所以，我如何通過這些部分讀取文件？例如。在一個變量之前讀取文件$$$NEW_SECTION$$$標記，並在其後（不使用正則表達式等）。這裏有任何簡單的解決方案嗎？

來源

2013-11-25 Anton Kochkov

這裏是沒有讀整個文件到內存的解決方案之前，驗證：

data1 = [] 
pos = 0 
with open('data.txt', 'r') as f: 
    line = f.readline() 
    while line and not line.startswith('$$$'): 
     data1.append(line) 
     line = f.readline() 

    pos = f.tell() 

data2 = [] 
with open('data.txt', 'r') as f: 
    f.seek(pos) 
    for line in f: 
     data2.append(line) 

print data1 
print data2

第一次迭代不能for line in f不可言破壞文件中的準確位置。

來源

2013-11-25 10:27:13 BartoszKP

最簡單的辦法是str.split

>>> s = filecontents.split("$$$NEW_SECTION$$$") 
>>> s[0] 
'0x23499 0x234234 0x234234\n\n0x34534 0x353454 0x345464\n' 
>>> s[1] 
'\n4345-34534-345-345345-3453\n3453-34534-346-766788-3534'

來源

2013-11-25 10:20:25 beerbajay

如果您已經將整個文件讀入內存中。 –

是的，正確的。如果。 – beerbajay

解決方案1：

如果文件是不是甚大，則：

with open('your_log.txt') as f: 
    parts = f.read().split('$$$NEW_SECTION$$$') 
    if len(parts) > 0: 
    part1 = parts[0] 
    ...

解決方案2：

def FileParser(filepath): 
    with open(filepath) as f: 
    part = '' 
    while(line = f.readline()): 
     part += line 
     if (line != '$$$NEW_SECTION$$$'): 
     returnpart = part 
     part = '' 
     yield returnpart 


for segment in FileParser('your_log.txt'): 
    print segment

注：這是未經測試的代碼，所以請使用它

來源

2013-11-25 10:25:18 Chandan

解決方案：

def sec(file_, sentinel): 
    with open(file_) as f: 
     section = [] 
     for i in iter(f.readline, ''): 
      if i.rstrip() == sentinel: 
       yield section 
       section = [] 
      else: 
       section.append(i) 
     yield section

及用途：

>>> from pprint import pprint 
>>> pprint(list(sec('file.txt'))) 
[['0x23499 0x234234 0x234234\n', '0x34534 0x353454 0x345464\n'], 
['4345-34534-345-345345-3453\n', 
    '3453-34534-346-766788-3534\n', 
    '3453-34534-346-746788-3534\n']] 
>>>

部分變量或最好的部分與dict：

>>> sections = {} 
>>> for n, section in enumerate(sec('file.txt')): 
...  sections[n] = section 
>>>

來源

2013-11-25 17:01:59 SmartElectron

如何從文件中讀取一些'標記'？

回答

相關問題