如何將兩個文件中的行與python中的條件結合起來？

我需要線兩個文件結合起來，在基礎條件，在這些文件中的一個線是第二檔的線的一部分。如何將兩個文件中的行與python中的條件結合起來？

的第一個文件的一部分：

 
12319000 -64,7357668067227 -0,1111052148685535 
12319000 -79,68527661064425 -0,13231739777754026 
12319000 -94,69642857142858 -0,15117839559513543  
12319000 -109,59301470588237 -0,18277783185642743 
12319001 99,70264355742297 0,48329515727315125 
12319001 84,61113445378152 0,4060446341409862 
12319001 69,7032037815126 0,29803063228455073 
12319001 54,93886554621849 0,20958105041136763 
12319001 39,937394957983194 0,13623056582981297 
12319001 25,05574229691877 0,07748669438398018 
12319001 9,99716386554622 0,028110643107892755

第二個文件的一部分：

 
12319000.abf mutant 1 
12319001.abf mutant 2 
12319002.abf mutant 3

我需要創建一個文件，其中將生產線由這樣的：從所有線路第一個文件和第二個文件的所有內容。第一列中的文件名除外。

正如你可以看到，有更多的，比第一文件中的一行，cooresponding在第二個行。我需要一個操作中，每個行來完成，所以輸出應該是這樣的：

 
12319000 -94,69642857142858 -0,15117839559513543 mutant 1 
12319000 -109,59301470588237 -0,18277783185642743 mutant 1 
12319001 99,70264355742297 0,48329515727315125 mutant 2 
12319001 84,61113445378152 0,4060446341409862 mutant 2

我寫這段代碼：

oocytes = open(file_with_oocytes, 'r') 
results = open(os.path.join(path, 'results.csv'), 'r') 
results_new = open(os.path.join(path, 'results_with_oocytes.csv'), 'w') 
for line in results: 
    for lines in oocytes: 
     if lines[0:7] in line: 
      print line + lines[12:]

但它打印出這一點，僅此而已，第一個文件中有45行：

 
12319000 99,4952380952381 0,3011778623990699 
    mutant 1 

12319000 99,4952380952381 0,3011778623990699 
    mutant 2 

12319000 99,4952380952381 0,3011778623990699 
    mutant 3

代碼有什麼問題？或者它應該以某種方式完全不同？

來源

2012-03-30 Phlya

+1包含您嘗試的代碼 – bernie 2012-03-30 21:26:30

第一列的文件是否按順序排列？可靠嗎？ – MattH 2012-03-30 21:32:14

文件「小」嗎？也就是說，他們可以一次讀入並保存在記憶中嗎？ – 2012-03-30 21:33:47

注意，該解決方案不依賴於任何領域，除了在第二文件的文件擴展名的長度的長度。

# make a dict keyed on the filename before the extension 
# with the other two fields as its value 
file2dict = dict((row[0][:-4], row[1:]) 
        for row in (line.split() for line in file2)) 

# then add to the end of each row 
# the values to it's first column 
output = [row + file2dict[row[0]] for row in (line.split() for line in file1)]

僅用於測試目的，我用：

# I just use this to emulate a file object, as iterating over it yields lines 
# just use file1 = open(whatever_the_filename_is_for_this_data) 
# and the rest of the program is the same 
file1 = """12319000 -64,7357668067227 -0,1111052148685535 
12319000 -79,68527661064425 -0,13231739777754026 
12319000 -94,69642857142858 -0,15117839559513543 
12319000 -109,59301470588237 -0,18277783185642743 
12319001 99,70264355742297 0,48329515727315125 
12319001 84,61113445378152 0,4060446341409862 
12319001 69,7032037815126 0,29803063228455073 
12319001 54,93886554621849 0,20958105041136763 
12319001 39,937394957983194 0,13623056582981297 
12319001 25,05574229691877 0,07748669438398018 
12319001 9,99716386554622 0,028110643107892755""".splitlines() 

# again, use file2 = open(whatever_the_filename_is_for_this_data) 
# and the rest of the program will work the same 
file2 = """12319000.abf mutant 1 
12319001.abf mutant 2 
12319002.abf mutant 3""".splitlines()

，你應該只使用普通的文件對象。測試數據的輸出爲：

[['12319000', '-64,7357668067227', '-0,1111052148685535', 'mutant', '1'], 
    ['12319000', '-79,68527661064425', '-0,13231739777754026', 'mutant', '1'], 
    ['12319000', '-94,69642857142858', '-0,15117839559513543', 'mutant', '1'], 
    ['12319000', '-109,59301470588237', '-0,18277783185642743', 'mutant', '1'], 
    ['12319001', '99,70264355742297', '0,48329515727315125', 'mutant', '2'], 
    ['12319001', '84,61113445378152', '0,4060446341409862', 'mutant', '2'], 
    ['12319001', '69,7032037815126', '0,29803063228455073', 'mutant', '2'], 
    ['12319001', '54,93886554621849', '0,20958105041136763', 'mutant', '2'], 
    ['12319001', '39,937394957983194', '0,13623056582981297', 'mutant', '2'], 
    ['12319001', '25,05574229691877', '0,07748669438398018', 'mutant', '2'], 
    ['12319001', '9,99716386554622', '0,028110643107892755', 'mutant', '2']]

來源

2012-03-30 21:35:31 agf

我不完全理解，這應該如何與洞文件一起工作？我應該修改第一部分爲 file1 = file1_old.splitlines（） file2 = file2_old.splitlines（）然後執行第二部分？ – Phlya 2012-03-30 21:41:14

@Ilya我添加了幾個註釋，但基本上只是使用'fileX = open（filename）'而不是我對該文件的註釋。 – agf 2012-03-30 21:44:50

謝謝！現在就試試吧。 – Phlya 2012-03-30 21:51:38

Python中的文件句柄有狀態;也就是說，他們不像列表那樣工作。您可以反覆遍歷列表並每次獲取所有值。另一方面，文件具有發生下一個read()的位置。當你遍歷文件時，你每行都有read()。當到達最後一行時，文件指針位於文件的末尾。從文件末尾的read()返回字符串''！

你需要做的在開始時oocytes文件一旦被讀取，存儲的值，也許這樣的事情是什麼：

oodict = {} 
for line in oocytes: 
    oodict[line[0:7]] = line[12:] 

for line in results: 
    results_key = line[0:7] 
    if results_key in oodict: 
     print oodict[results_key] + line

來源

2012-03-30 21:39:40 Cuadue

好，簡單的事情首先，你打印的換行符末行 - 你想放棄與線[0：0]

接下來，「行[0：7]」只測試線的前7個字符 - 你想考8個字符。這就是爲什麼用3個不同的突變值打印出「同一行」的相同值。

最後，你需要爲結果中的每個行關閉並重新打開卵母細胞。如果不這樣做，會在第一行結果後結束輸出。

實際上，其他答案更好 - 不要爲每一行結果打開和關閉卵母細胞 - 打開它並將其讀入（到列表中）一次，然後遍歷每行結果的列表。

來源

2012-03-30 21:40:41

爲什麼要關閉並重新打開，當你可以尋求（0）？ – 2012-03-30 21:42:09

如何將兩個文件中的行與python中的條件結合起來？

回答

相關問題