我想比較兩個文件(從第一個文件中取出一行,然後在整個第二個文件中查找)以查看它們之間的差異,並將fileA.txt中缺失的行寫入fileB.txt結尾。我是新來的Python因此在第一次我以爲安博這樣簡單的程序:比較兩個文件在python中的差異
import difflib
file1 = "fileA.txt"
file2 = "fileB.txt"
diff = difflib.ndiff(open(file1).readlines(),open(file2).readlines())
print ''.join(diff),
但結果我有兩個文件組合爲每個行合適的變量。我知道我可以用標籤「 - 」查找行開頭,然後將其寫入文件fileB.txt的結尾,但是對於大文件(〜100 MB),此方法效率不高。有人可以幫助我改進計劃嗎?
文件結構將是這樣的:
輸入:
fileA.txt
Oct 9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct 9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct 9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:46:58 user sshd[12844]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct 9 13:46:58 user sshd[12844]: pam_unix(sshd:session): session closed for user root
Oct 9 15:47:58 user sshd[12868]: pam_unix(sshd:session): session closed for user root
Oct 11 22:17:31 user sshd[2655]: Accepted password for root from 17X.XXX.XXX.X19 port 5567 ssh2
Oct 11 22:17:31 user sshd[2655]: pam_unix(sshd:session): session opened for user root by (uid=0)
fileB.txt
Oct 9 12:19:16 user sshd[12744]: Accepted password for root from 213.XXX.XXX.XX7 port 60554 ssh2
Oct 9 12:19:16 user sshd[12744]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:24:42 user sshd[12744]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct 9 13:24:42 user sshd[12744]: pam_unix(sshd:session): session closed for user root
Oct 9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct 9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct 9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
輸出:
FILEB _after.txt
Oct 9 12:19:16 user sshd[12744]: Accepted password for root from 213.XXX.XXX.XX7 port 60554 ssh2
Oct 9 12:19:16 user sshd[12744]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:24:42 user sshd[12744]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct 9 13:24:42 user sshd[12744]: pam_unix(sshd:session): session closed for user root
Oct 9 13:25:31 user sshd[12844]: Accepted password for root from 213.XXX.XXX.XX7 port 33254 ssh2
Oct 9 13:25:31 user sshd[12844]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:35:48 user sshd[12868]: Accepted password for root from 213.XXX.XXX.XX7 port 33574 ssh2
Oct 9 13:35:48 user sshd[12868]: pam_unix(sshd:session): session opened for user root by (uid=0)
Oct 9 13:46:58 user sshd[12844]: Received disconnect from 213.XXX.XXX.XX7: 11: disconnected by user
Oct 9 13:46:58 user sshd[12844]: pam_unix(sshd:session): session closed for user root
Oct 9 15:47:58 user sshd[12868]: pam_unix(sshd:session): session closed for user root
Oct 11 22:17:31 user sshd[2655]: Accepted password for root from 17X.XXX.XXX.X19 port 5567 ssh2
Oct 11 22:17:31 user sshd[2655]: pam_unix(sshd:session): session opened for user root by (uid=0)
所以基本上要合併兩個文本文件但不保留重複? – MooingRawr