2013-11-25 22 views
0

例如,如果我有一個結構非常簡單,在這裏是它的幾個不同部位,用不同結構的一些文本/日誌文件,並通過一些標記線分裂,如:如何從文件中讀取一些'標記'?

0x23499 0x234234 0x234234 
... 
0x34534 0x353454 0x345464 
$$$NEW_SECTION$$$ 
4345-34534-345-345345-3453 
3453-34534-346-766788-3534 
... 

所以,我如何通過這些部分讀取文件?例如。在一個變量之前讀取文件$$$NEW_SECTION$$$標記,並在其後(不使用正則表達式等)。這裏有任何簡單的解決方案嗎?

回答

2

這裏是沒有讀整個文件到內存的解決方案之前,驗證:

data1 = [] 
pos = 0 
with open('data.txt', 'r') as f: 
    line = f.readline() 
    while line and not line.startswith('$$$'): 
     data1.append(line) 
     line = f.readline() 

    pos = f.tell() 

data2 = [] 
with open('data.txt', 'r') as f: 
    f.seek(pos) 
    for line in f: 
     data2.append(line) 

print data1 
print data2 

第一次迭代不能for line in f不可言破壞文件中的準確位置。

0

最簡單的辦法是str.split

>>> s = filecontents.split("$$$NEW_SECTION$$$") 
>>> s[0] 
'0x23499 0x234234 0x234234\n\n0x34534 0x353454 0x345464\n' 
>>> s[1] 
'\n4345-34534-345-345345-3453\n3453-34534-346-766788-3534' 
+0

如果您已經將整個文件讀入內存中。 –

+0

是的,正確的。如果。 – beerbajay

0

解決方案1:

如果文件是不是甚大,則:

with open('your_log.txt') as f: 
    parts = f.read().split('$$$NEW_SECTION$$$') 
    if len(parts) > 0: 
    part1 = parts[0] 
    ... 

解決方案2:

def FileParser(filepath): 
    with open(filepath) as f: 
    part = '' 
    while(line = f.readline()): 
     part += line 
     if (line != '$$$NEW_SECTION$$$'): 
     returnpart = part 
     part = '' 
     yield returnpart 


for segment in FileParser('your_log.txt'): 
    print segment 

注:這是未經測試的代碼,所以請使用它

0

解決方案:

def sec(file_, sentinel): 
    with open(file_) as f: 
     section = [] 
     for i in iter(f.readline, ''): 
      if i.rstrip() == sentinel: 
       yield section 
       section = [] 
      else: 
       section.append(i) 
     yield section 

及用途:

>>> from pprint import pprint 
>>> pprint(list(sec('file.txt'))) 
[['0x23499 0x234234 0x234234\n', '0x34534 0x353454 0x345464\n'], 
['4345-34534-345-345345-3453\n', 
    '3453-34534-346-766788-3534\n', 
    '3453-34534-346-746788-3534\n']] 
>>> 

部分變量或最好的部分與dict:

>>> sections = {} 
>>> for n, section in enumerate(sec('file.txt')): 
...  sections[n] = section 
>>>