閱讀在Python 3.0同時兩個文本文件，並提取所需要的字符串

-2

file_1數據的兩個文本文件：

data1 data_1 1 
data2 data_2 2 
data3 data_2 2 
data2 data_4 1 
data3 data_3 1 and so on....

等

file_2：

data1 
data2 
data1 
data3 
data2

我想得到一個輸出爲

data1: 
     > data1 data_1 1 
     > data1 data_3 2 

data2: 
     > data2 data_2 2 
     > data2 data_4 1 

data3: 
     > data3 data_3 1

等等...

我從目前的代碼會得到什麼：

data1: 
     > data1 data_1 1 

data2: 
     > data2 data_2 2 

data3: 
     > data3 data_2 2 
     > data2 data_4 1 
     > data3 data_3 1

代碼：預期

first_occurance = {} 
    with open("folder_1/file_1", "r") as file_1: 
     with open("folder_1/file_2", "r") as file_2: 
      for line_1,line_2 in zip(file_1, file_2): 
       only_command = line_1.split()[0] 
       if only_command in line_2: 
        if only_command not in first_occurance: 
         print ("\n " + only_command + " :\n") 
         print ("  > " + line_1.strip()) 
        else: 
         print ("  > " + line_1.strip()) 
        first_occurance[only_command] = only_command

但是，這並不工作，因爲數據是未根據標題格式化，例如對應於data2的行也顯示在data3中。針對此問題的任何指導，將是很有益....

來源

2015-01-16 user89

你能描述會發生什麼嗎？ – user3467349

我編輯了這個問題..希望它現在更清楚了...... – user89

還不完全。那麼你期望'data3'會發生什麼？是否應該打印在data2塊之下？ – fnl

這是我想你可能會試圖做：

from collections import defaultdict 

data = """data1 data_1 1 
data2 data_2 2 
data1 data_3 2 
data3 data_4 1 
data2 data_3 1""" 

commands = """data1 
data2 
data1 
data3 
data2""" 

store = defaultdict(list) 

for line, cmd in zip(data.split('\n'), commands.split('\n')): 
    if line.startswith(cmd): 
     store[cmd].append(line.strip()) 

for command in sorted(store): 
    print("\n{}:".format(command)) 
    for l in store[command]: 
     print("  >", l)

這將產生以下的輸出：

data1: 
     > data1 data_1 1 
     > data1 data_3 2 

data2: 
     > data2 data_2 2 
     > data2 data_3 1 

data3: 
     > data3 data_4 1

對於命令中的每一行（您從file_2讀取的內容），如果數據中的完全相同的行（file_1）以相同的「命令」開頭，則會被存儲。順便提一句，你正在改變數據，我不確定我們瞭解你想要什麼。看來，file_2甚至是無用的，或者你可能想重新調整你的數據？

無論在存儲分組數據後，您都可以按排序順序（data1，2，3 ...）打印組。您必須存儲所有組，否則您必須爲每個（數據）組一次又一次地讀取文件。如果你沒有得到你當前的輸出 - 因爲你在收到數據時會打印這些數據。

但是，看起來你的file_2數據根本不需要，至少根據你想要的問題的輸出。所以這裏是產生你想要的輸出的文件閱讀版本;注意它不需要閱讀file_2：

from collections import defaultdict 

store = defaultdict(list) 

with open("folder_1/file_1", "r") as data: 
    for line in data: 
     cmd, content = line.split(' ', 1) 
     store[cmd].append(line.strip()) 

for cmd in sorted(store): 
    print("\n{}:".format(cmd)) 
    for line in store[cmd]: 
     print("  >", line)

來源

2015-01-16 16:03:15 fnl

我得到一個錯誤爲** line_1，line_2 in zip（file_1.split（'\ n'），file_2.split（'\ n'））： AttributeError：'_io。TextIOWrapper'對象沒有屬性'split'** – user89

因爲你正在對文件句柄/流進行分割。請檢查更新，代碼與您的文件閱讀版本保持一致。 – fnl

另外，請注意，我不明白爲什麼你甚至需要檢查'file_2'？看來你正在改變你的數據分配？ – fnl

閱讀在Python 3.0同時兩個文本文件，並提取所需要的字符串

回答

相關問題