2016-05-10 63 views
1

我對普通數據有疑問。我有下面的格式包括數據的三個文本文件:從3個文本文件和匹配行下的行輸出匹配行

cli= 111 
    mon= 45 

    cli= 584 
    mon= 21 

    cli= 23 
    mon= 417 

現在我有以下程序whcih當我執行它,它給了我所有匹配的CLI。換句話說,它給了我在3個文本文件中出現的CLI。

with open ('/home/user/Desktop/text1.txt', 'r') as file1: 
    with open ('/home/user/Desktop/text2.txt', 'r') as file2: 
      with open ('/home/user/Desktop/text3.txt', 'r') as file3: 
        same = set(file1).intersection(file2).intersection(file3) 
same.discard('\n') 

with open ('/home/user/Desktop/common.txt', 'w') as file_out: 
    for line in same: 
      file_out.write(line) 

我的問題是,我也可以輸出值(MON = 45)與CLI = 111?假設所有3個文本文件中都存在CLI = 111。我想要一個這樣的結果:

cli= 111 
    mon= 45 
    mon= 98 
    mon= 32 

在此先感謝。 PS:以上示例數據僅爲1個文本文件。假設有3個文本文件。謝謝!

+0

所以你想在每個cli出現在所有文件後的相應星期一? –

+0

@Padraic坎寧安確切! – starshine

+0

好的,好吧,用字典很容易,我會把東西扔在一起 –

回答

0

看來,你是你想要的數據扔掉以後訪問。無需再次解析文件,您需要以某種方式捕獲該數據,以免再次查看文件。一種方法來做到這一點(假設每個'cli'只有一個對應的'mon'每個文件)將與一個字典。

我已經提供了一個函數,用於提供一個字典,其中的密鑰是'cli'數據,值是mon數據。從那裏,你可以從Dictionary鍵中創建一個Set(),並以這種方式找到交集。從路口,你知道,返回的值必須在字典鍵,所以只需將它們拼接成「出來」字符串和寫入,爲您的出文件:你到了那裏

def buildDict(f): 
     dic = {} 
     for i in range(0,len(f)): 
      if "cli" in f[i]: 
       dic[f[i]] = f[i+1] 
     return dic 

    with open ('1.txt', 'r') as file1: 
     f1_dic = buildDict(file1.readlines()) 
     with open ('2.txt', 'r') as file2: 
      f2_dic = buildDict(file2.readlines()) 
      with open ('3.txt', 'r') as file3: 
       f3_dic = buildDict(file3.readlines()) 
       same = set(f1_dic.keys()).intersection(f2_dic.keys()).intersection(f3_dic.keys()) 

    out = '' 
    for i in same: 
     out += i 
     out += f1_dic[i] 
     out += f2_dic[i] 
     out += f3_dic[i] 


    with open ('common.txt', 'w') as file_out: 
     file_out.write(out) 
0

你可以組一個字典是在所有文件中的數據拉後CLI的線路:

with open('text1.txt', 'r') as file1, open('text2.txt', 'r') as file2, open('text3.txt', 'r') as file3: 
    inter = set(file1).intersection(file2).intersection(file3) 

    # create a dict using lists as values to group the mons and remove empty lines 
    d = {k: [] for k in inter if k.strip()} 
    # don't need set anymore, dict lookups are also O(1) 
    del inter 
    # reset pointers 
    file1.seek(0), file2.seek(0), file3.seek(0) 

    # iterate over files again 
    for f in [file1, file2, file3]: 
     for line in f: 
      if line in d: 
       # pull next line if we get a match. 
       d[line].append(next(f)) 

然後只寫字典內容:

with open('/home/user/Desktop/common.txt', 'w') as file_out: 
    for k,v in d.items(): 
     file_out.write(k) 
     for line in v: 
      file_out.write(line) 

如果你正在尋找一個特定的行,即以cli =開頭,那麼另一種方法是首先用file1數據構建字典,然後迭代餘下的部分,當你去寫時只寫入其值/列表長度爲== 3的數據:

with open('text1.txt', 'r') as file1, open('text2.txt', 'r') as file2, open(
     'text3.txt', 'r') as file3: 
    # create dict from inital file storing following line after cli-.. inside list as value 
    d = {k: [next(file1)] for k in file1 if k.starstwith("cli=")} 

    for f in [file2, file3]: 
     for line in f: 
      if line in d: 
       d[line].append(next(f)) 

with open('/home/user/Desktop/common.txt', 'w') as file_out: 
    for k, v in d.items(): 
     # if len is three we have one from each 
     if len(v) == 3: 
      file_out.write(k) 
      for line in v: 
       file_out.write(line) 

這將失敗的唯一方法是,如果你有一個或多個文件,有一個重複的CLI = ...

0

有趣的黑客即時建立一套線路;但正如你所看到的那樣,它有點太巧妙了,因爲mon線與cli線分離。所以讓我們嘗試更仔細,這樣不會發生這種情況讀書:

import re 

def getfile(fname): 
    with open(fname) as file1: 
     text = file1.read() 
    records = text.split("\n\n") 
    return dict(re.search(r"cli= *(\d+)\nmon= *(\d+)", rec).groups() for rec in records) 

d1 = getfile('/home/user/Desktop/text1.txt') 
d2 = getfile('/home/user/Desktop/text2.txt') 
d3 = getfile('/home/user/Desktop/text3.txt') 
same = set(d1).intersection(d2).intersection(d3) 

print("cli="+same) 
print("mon="+d1[same]) 
print("mon="+d2[same]) 
print("mon="+d3[same]) 

我打開每個文件成cli值映射到mon值,因爲他們在對的字典。然後我們交叉cli值並使用它們查找mon值。