在一次傳遞中匹配不同模式的計數行

我有一個python腳本，給定的模式覆蓋一個文件，並且匹配它所保存的模式的每一行計數該行在文件中顯示的次數。在一次傳遞中匹配不同模式的計數行

腳本如下：

#!/usr/bin/env python 

import time 
fnamein = 'Log.txt' 

def filter_and_count_matches(fnamein, fnameout, match): 
    fin = open(fnamein, 'r') 
    curr_matches = {} 
    order_in_file = [] # need this because dict has no particular order 
    for line in (l for l in fin if l.find(match) >= 0): 
    line = line.strip() 
    if line in curr_matches: 
     curr_matches[line] += 1 
    else: 
     curr_matches[line] = 1 
     order_in_file.append(line) 
    # 
    fout = open(fnameout, 'w') 
    #for line in order_in_file: 
    for line, _dummy in sorted(curr_matches.iteritems(), 
     key=lambda (k, v): (v, k), reverse=True): 
    fout.write(line + '\n') 
    fout.write(' = {}\n'.format(curr_matches[line])) 
    fout.close() 

def main(): 
    for idx, match in enumerate(open('staffs.txt', 'r').readlines()): 
    curr_time = time.time() 
    match = match.strip() 
    fnameout = 'm{}.txt'.format(idx+1) 
    filter_and_count_matches(fnamein, fnameout, match) 
    print 'Processed {}. Time = {}'.format(match, time.time() - curr_time) 

main()

所以現在我在文件我要檢查不同的模式，每次去。這樣做可能只需要一次（文件非常大，所以需要一段時間才能處理）。能夠以優雅的「簡單」方式做到這一點很好。謝謝！

感謝

來源

2013-04-11 skeept

不回答你的問題，而是'grep'很可能將在這裏更加有用，如果這實際上是對你的問題的最終目標。 – 2013-04-11 15:36:03

貌似Counter會做你需要的東西：

from collections import Counter 
lines = Counter([line for line in myfile if match_string in line])

例如，如果myfile包含

123abc 
abc456 
789 
123abc 
abc456

和match_string是，那麼上面的代碼給你

>>> lines 
Counter({'123abc': 2, 'abc456': 2})

的多模式，這個怎麼樣：

patterns = ["abc", "123"] 
# initialize one Counter for each pattern 
results = {pattern:Counter() for pattern in patterns} 
for line in myfile: 
    for pattern in patterns: 
     if pattern in line: 
      results[pattern][line] += 1

來源

2013-04-11 15:36:59

我試過了，櫃檯似乎比我的方法快得多。問題是，因爲我有不同的模式，我想要計數，我仍然需要瀏覽這些模式中的每一個文件。這是我現在想要避免的主要問題。有關於此的任何想法？ – skeept 2013-04-11 15:46:40

@skeept：你想要一個額外的計數器爲每個模式，或一個計數器的所有比賽？ – 2013-04-11 16:00:37

每個模式一個計數器。這就是代碼現在所做的事情（它會保存每個模式出現在與該模式相對應的文件中的數字），但它會在每個模式下覆蓋該文件一次，我想避免這種情況。 – skeept 2013-04-11 16:06:42

在一次傳遞中匹配不同模式的計數行

回答

相關問題