下面介紹如何定義一個循環矢量類,該類只跟蹤從上到下處理文件時可能需要的數據。它有相當數量的評論,因此它可以被理解,而不僅僅是代碼轉儲。解析的細節當然強烈依賴於你的輸入是什麼樣子。我的代碼根據您可能需要更改的示例文件進行假設。例如,使用startswith()可能過於僵化,這取決於您的輸入,您可能需要使用find()。
代碼
from __future__ import print_function
import sys
from itertools import chain
class circ_vec(object):
"""A circular fixed vector.
"""
# The use of slots drastically reduces memory footprint of Python classes -
# it removes the need for a hash table for every object
__slots__ = ['end', 'elems', 'capacity']
# end will keep track of where the next element is to be added
# elems holds the last X elemenst that were added
# capacity is how many elements we will hold
def __init__(self, capacity):
# we only need to specify the capacity up front
# elems is empty
self.end = 0
self.elems = []
self.capacity = capacity
def add(self, e):
new_index = self.end
if new_index < len(self.elems):
self.elems[new_index] = e
else:
# If we haven't seen capacity # of elements yet just append
self.elems.append(e)
self.end = (self.end + 1) % self.capacity
def __len__(self):
return len(self.elems)
# This magic method allows brace [ ] indexing
def __getitem__(self, index):
if index >= len(self.elems):
print("MY RAISE")
raise IndexError
first = self.capacity - self.end - 1
index = (index + first) % self.capacity
# index = (self.end + key) % self.capacity
# print("LEN = ", len(self.elems))
# print("INDEX = ", index)
return self.elems[index]
# This magic method allows iteration
def __iter__(self):
if not self.elems:
return iter([])
elif len(self.elems) < self.capacity:
first = 0
else:
first = self.end
# Iterate from the oldest element to the newest
return chain(iter(self.elems[first:]), iter(self.elems[:first]))
string_group_last_four = { k : circ_vec(4) for k in ['A', 'B'] }
with open(sys.argv[1], 'r') as f:
string_group_context = None
# We will manually iterate through the file. Get an iterator using iter().
it = iter(f)
# As per the example, the file we're reading groups lines in twos.
buf = circ_vec(2)
try:
while(True):
line = next(it)
buf.add(line.strip())
# The lines beginning with 'String Group' should be recorded in case we need them later.
if line.startswith('String Group'):
# Here is the benefit of manual iteration. We can call next() more than once per loop iteration.
# Sometimes once we've read a line, we just want to immediately get the next line.
# strip() removes whitespace and the newline characters
buf.add(next(it).strip())
# How exactly you will parse your lines depends on your needs. Here, I assume that the last word in
# the current line is an identifier that we are interested in.
string_group = line.strip().split()[-1]
# Add the lines in the buffer to the circular vector belonging to the identifier.
string_group_last_four[string_group].add(list(l for l in buf))
buf = circ_vec(2)
# For lines beginning with 'Other Main String for', we need to
# remember the identifier but there's no other information to
# record.
elif line.startswith('Other Main String for'):
string_group_context = line.strip().split()[-1]
# Use find() instead of startswith() because the
# 'test1(OK) # test2(OK)' lines begin with whitespace. startswith()
# would depend on the specific whitespace characters which could
# be confusing.
elif line.find('test1(OK) test2(OK)') != -1:
print('String group' + string_group_context + ' has a test hit!')
# Print out the test lines.
for l in buf: print(l)
print('Four most recent "String Group ' + string_group_context + '" lines:')
# Use the identifier dict to get the last 4 relevant groups of lines
for cv in string_group_last_four[string_group_context]:
for l in cv: print(l)
# Manual iteration is terminated by an exception in Python. Catch and swallow it
except StopIteration: pass
print("Done!")
測試文件的內容。 我試圖讓它有點奇怪,有點行使代碼。
Other Main String for A
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
String Group 1 A
Useful information for A
String Group 2 A
Useful information for A
Other Main String for A
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
String Group 1 B
Useful information for B
String Group 3 A
Useful information for A
String Group 2 B
Useful information for B
String Group 4 A
Useful information for A
String Group 5 A
Useful information for A
String Group 6 A
Useful information for A
String Group 3 B
Useful information for B
Other Main String for A
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
Other Main String for B
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
Other Main String for B
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
String Group 4 B
Useful information for B
Other Main String for B
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
String Group 7 A
Useful information for A
Other Main String for A
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
輸出
String groupA has a test hit!
Other Main String for A
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
Four most recent "String Group A" lines:
String groupA has a test hit!
Other Main String for A
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
Four most recent "String Group A" lines:
String Group 1 A
Useful information for A
String Group 2 A
Useful information for A
String groupA has a test hit!
Other Main String for A
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
Four most recent "String Group A" lines:
String Group 3 A
Useful information for A
String Group 4 A
Useful information for A
String Group 5 A
Useful information for A
String Group 6 A
Useful information for A
String groupB has a test hit!
Other Main String for B
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
Four most recent "String Group B" lines:
String Group 1 B
Useful information for B
String Group 2 B
Useful information for B
String Group 3 B
Useful information for B
String groupB has a test hit!
Other Main String for B
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
Four most recent "String Group B" lines:
String Group 1 B
Useful information for B
String Group 2 B
Useful information for B
String Group 3 B
Useful information for B
String groupB has a test hit!
Other Main String for B
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
Four most recent "String Group B" lines:
String Group 1 B
Useful information for B
String Group 2 B
Useful information for B
String Group 3 B
Useful information for B
String Group 4 B
Useful information for B
String groupA has a test hit!
Other Main String for A
test1(OK) test2(OK) *** Condition Met *** #Now go back and collect the last 4 entries of 「Useful information for A」 from 「String Group A」
Four most recent "String Group A" lines:
String Group 4 A
Useful information for A
String Group 5 A
Useful information for A
String Group 6 A
Useful information for A
String Group 7 A
Useful information for A
Done!
你有興趣的任何事件 '有用的信息爲' 前 'TEST1(OK)TEST2(OK)' 和 '其他主要字符串A' 或只那些之前? – Praxeolitic
Hi Praxeolitic,我只查找'字符串組A'的一次'test1(OK)test2(OK)'爲'A'的前4條目,同樣我需要重複'B'條件 – MikG