我想比較python中的兩個不同文件。它們包含具有概率的行,並且每行在文件末尾都有一個in id。我需要計算每個ID的比例。問題是每一行可能包含不同數量的概率,最終每個文本的行數不同。我成功地創建了一個腳本,只用一行比較兩個文件,但我不知道如何爲文本中的每一行執行此操作。以下是我的腳本到目前爲止:在python中逐行比較兩個文件
#!/usr/bin/python
import math
import operator
f = open('output.txt','w')
file1= open("test.ppx1","r")
file2= open("test.prob1","r")
words = list(file1.read().split())
words2 = list(file2.read().split())
id1=words[-1]
id2=words2[-1]
words.remove(id1)
words2.remove(id2)
words[:]=[x[:12] for x in words]
words2[:]=[x[:12] for x in words2]
words=map(float,words)
words2=map(float,words2)
words=[math.log(y,10) for y in words]
words2=[math.log(y,10) for y in words2]
words=sum(words)
words2=sum(words2)
ratio= words-words2
print >>f, id1,words, words2,ratio
您能否顯示兩個文件的示例? – NTAWolf
2.506201e-08 2.346253e-02 1.282699e-02 3.336181e-05 1.821797e-07 1.424501e-07 utt-0000000001 2.506201e-08 2.346253e-02 1.282699e-02 3.336181e-05 1.821797e-07 1.424501 e-07 utt-0000000002 2.506201e-08 2.346253e-02 1.282699e-02 3.336181e-05 1.821797e-07 1.424501e-07 utt-0000000003第一個文件 – oezlem
2.506201e-08 2.346253e-02 1.282699e- 02 3.336181e-05 1.821797e-07 1.424501e-07 2.506201e-08 1.821797e-07 1.424501e-07 utt-0000000001 2.506201e-08 2.346253e-02 1.282699e-02 3.336181e-05 1.821797e-07 1.424501 e-07 utt-0000000002 2.506201e-08 2.346253e-02 1.282699e-02 3.336181e-05 1.821797e-07 1.424501e-07 1.424501e-07 1.424501e-07 utt-0000000003 – oezlem