2013-03-14 198 views
1

我有一個包含數十萬行的日誌文件。循環python中的循環

我正在通過這些行循環查找具有某些特定文本的任何行,例如:!!event!!
然後,一旦找到!!event!!行,我需要繼續循環此行!!event!!,直到找到接下來的3行包含自己的特定文本('flag1', 'flag2', and 'flag3')
一旦我找到第三行('flag3'),然後我想繼續循環下一行!!event!!行並重復前一個過程,直到沒有更多事件。

有沒有人有建議我構建我的代碼來完成這個?

例如:

f = open('samplefile.log','r') 
for line in f: 
    if '!!event!!' in line: 
      L0 = line 
     #then get the lines after L0 containing: 'flag1', 'flag2', and 'flag3' 
     # below is a sample log file 
     #I am not sure how to accomplish this 
     #(I am thinking a loop within the current loop) 
     #I know the following is incorrect, but the 
     intended result would be able to yield something like this: 
      if "flag1" in line: 
       L1 = line.split() 
      if "flag2" in line: 
       L2 = line.split() 
      if "flag3" in line: 
       L3 = line.split() 
print 'Event and flag times: ', L0[0], L1[0], L2[0], L3[0] 

samplefile.log

8:41:05 asdfa 32423 
8:41:06 dasd 23423 
8:41:07 dfsd 342342 
8:41:08 !!event!! 23423 
8:41:09 asdfs 2342 
8:41:10 asdfas flag1 
8:41:11 asda 42342 
8:41:12 sdfs flag2 
8:41:13 sdafsd 2342 
8:41:14 asda 3443 
8:41:15 sdfs 2323 
8:41:16 sdafsd flag3 
8:41:17 asda 2342 
8:41:18 sdfs 3443 
8:41:19 sdafsd 2342 
8:41:20 asda 3443 
8:41:21 sdfs 4544 
8:41:22 !!event!! 5645 
8:41:23 sdfs flag1 
8:41:24 sadfs flag2 
8:41:25 dsadf 32423 
8:41:26 sdfa 23423 
8:41:27 sdfsa flag3 
8:41:28 sadfa 23423 
8:41:29 sdfas 2342 
8:41:30 dfsdf 2342 

從這個示例代碼應打印:

Event and flag times: 8:41:08 8:41:10 8:41:12 8:41:16 
Event and flag times: 8:41:22 8:41:23 8:41:24 8:41:27 
+1

建議:將行饋送到狀態類似於find_event,find_flag1等的FSM(有限狀態機)。 – Ber 2013-03-14 15:28:43

+0

您應該使用正則表達式來執行此操作。如果你向我展示一些示例輸入以及你想要做什麼,我可以教你如何。 – 2013-03-14 15:32:22

回答

3

當然,你可以繼續消耗在一個內部循環的文件,然後跳出來當你遇到Flag3相同的,並且外環將恢復:

for line in f: 
    if '!!event!!' in line: 
     L0 = line.split() 
     for line in f: 
      if "flag1" in line: 
       L1 = line.split() 
      elif "flag2" in line: 
       L2 = line.split() 
      elif "flag3" in line: 
       L3 = line.split() 
       break    # continue outer loop 
     print 'Event and flag times: ', L0[0], L1[0], L2[0], L3[0] 

# Event and flag times: 8:41:08 8:41:10 8:41:12 8:41:16 
# Event and flag times: 8:41:22 8:41:23 8:41:24 8:41:27 
+0

謝謝你和所有迄今已回覆的!這是我看到的第一個答覆,它非常簡單而且有效。我仍然會看看其他回覆,看看是否還有更多關於此主題的信息 – teachamantofish 2013-03-14 16:26:33

0

在這裏你去:

with open("in6.txt") as f: 
    flag = False 
    c = 0 
    d = [] 
    data = [] 
    for line in f: 
     if flag: 
      if "flag1" in line or "flag2" in line: 
       data.append(line.split()[0]) 
      elif "flag3" in line: 
       data.append(line.split()[0]) 
       flag = False 
       d.append(data) 

      continue 
     if "!!event!!" in line: 
      flag = True 
      data = [] 
      c = 0 
      data.append(line.split()[0]) 

for l in d: 
    print "Event and flag times: ", l[0], l[1], l[2], l[3] 

輸出

>>> 
Event and flag times: 8:41:08 8:41:10 8:41:12 8:41:16 
Event and flag times: 8:41:22 8:41:23 8:41:24 8:41:27 
+0

你永遠不會在行中檢測正確的'flag'文本。這裏假設它是接下來的3條線是不正確的。 – 2013-03-14 15:38:00

+0

@MartijnPieters謝謝,更新... – ATOzTOA 2013-03-14 15:43:10

0

保持一個標誌來跟蹤你在找什麼:

with open('samplefile.log') as f: 
    events = [] 
    current_event = [] 
    for line in f: 
     if not current_event and '!!event!!' in line: 
      current_event.append(line.split()[0]) 
     else: 
      if 'flag1' in line or 'flag2' in line or 'flag3' in line: 
       current_event.append(line.split()[0]) 
       if 'flag3' in line: # could also be `if len(current_event) == 4:` 
        events.append(current_event) 
        current_event = [] 

for event in events: 
    print 'Event and flag times:', ' '.join(event) 

這裏我用current_event作爲國旗;通過將!!event!!行時間添加到它,它變得非空,我們開始尋找標誌。

我將個人活動時間收集到events列表中,但您也可以在找到flag3行時打印活動數據。

輸出:

Event and flag times: 8:41:08 8:41:10 8:41:12 8:41:16 
Event and flag times: 8:41:22 8:41:23 8:41:24 8:41:27 
0

只是循環遍歷每一行,那麼當你發現!!event!!,開始尋找標誌,一旦所有的標誌被發現,繼續...

喜歡的東西:

def get_time(line): 
    return [ i for i in line.split() if i != ''][0] 

data = [] 
index = -1 
look_for_flags = False 
for line in lines: 
    if '!!event!!' in line: 
     look_for_flags = True 
     data.append([get_time(line)]) 
     index += 1 
    elif look_for_flags: 
     if 'flag1' in line or 'flag2' in line or 'flag3' in line: 
      data[index].append(get_time(line)) 
print data 
0

執行此操作最明確的方法是使用generator function,這樣可以避免保留任何狀態。 每當你需要建立一個狀態機(就像你在這裏所做的那樣),想想generator

import sys 

def find_target_lines(file_handle): 
    target = yield 
    for line in file_handle: 
     if target in line: 
      target = yield line 

f = open('samplefile.log','r') 
targets = ['!!event!!', 'flag1', 'flag2', 'flag3'] 

while True: 
    found = list() 
    finder = find_target_lines(f) 
    next(finder) 
    try: 
     for target in targets: 
      line = finder.send(target) 
      if line: 
       found.append(line) 
     print(found) 
    except StopIteration: 
     break