如何從兩個文件中獲得退休和新增行？

我有2個文件，這將每日更新基於一些在線提要，文件包含了輸入的喜好和日常一些新的線路將被添加，有些將被刪除。此外，每天文件中行的順序也會改變。因此，我想提取今天添加的行，並且想知道昨天有多少行被刪除？

方法我也跟着：

假設說3個文件2017-07-17.txt , 2017-07-18.txt and 2017-07-19.txt文件有如下數據。

2017-07-17.txt

a 
b 
c

2017-07-18.txt

a 
b 
d 
e 
f

2017-07-19.txt

f 
e 
a 
c 
b 
d 
g

Did d iff前兩個文件。

3d2 
< c 
4a4,5 
> e 
> f

從輸出中很容易提取數據，並知道什麼是刪除和添加什麼。但我的輸入範圍從每天10萬到200萬行數據，因此使用diff不起作用。

問題我這種方法

中面臨當哪天說2017-07-19.txt輸入改變了順序，diff邏輯運作非常有線，因爲它掃描線，以線。

$ diff 2017-07-18.txt 2017-07-19.txt 
0a1,2 
> f 
> e 
1a4 
> c 
4,5c7 
< e 
< f 
--- 
> g

是否有任何解決方案，我可以用來得到這樣的輸出。

預期輸出：

$ diff 2017-07-18.txt 2017-07-19.txt 
    Addeed : c 
      g 

    Deleted : None

來源

2017-07-19 Backdoor Cipher

它與python有什麼關係？ – Rahul

$ cat awk-script 
NR==FNR{a[$0];next} 
{ 
    if($0 in a) 
    a[$0]=1 
    else 
    add=add"\t"$0"\n" 
} 
END { 
    for(i in a) 
    if(a[i]!=1) 
     del=del"\t"i"\n" 
    printf "Added:%s\n",(add)?add:"None\n" 
    printf "Deleted:%s",(del)?del:"None\n" 
} 

$ awk -f awk-script 2017-07-18.txt 2017-07-19.txt 
Added: c 
     g 

Deleted:None

來源

2017-07-19 07:07:02 CWLiu

相當不錯，沒有'和'全部。^ –

謝謝@JamesBrown – CWLiu

我同意（我將它複製到我的解決方案中以獲得精彩的演示文稿） – NeronLeVelu

這應該這樣做。但請注意，此解決方案將導致您在內存中讀取整個文件。

f1 = open("2017-07-18.txt") 
f2 = open("2017-07-19.txt") 

lines1 = set(f1.readlines()) 
lines2 = set(f2.readlines()) 

print lines2 - lines1 # added today 

print lines1 - (lines2 & lines1) # deleted today

來源

2017-07-19 06:41:19 abc

findstr /v /x /L /g:filename1 filename2 |find /c /v ""

可能產量的兩個文件（不知道，200K線將打擊任何限制）

查找文件名2線能把/v不/x完全匹配/L之間的差異計數字面意思/g:這個文件中的行。將結果輸出到find，然後/c對來自先前命令的行進行計數，其中/v不匹配""（即，計數從前面的命令行）

要指定此給一個變量，使用

for /f %%a in ('findstr /v /x /L /g:filename1 filename2 ^|find /c /v "" ') do set count=%%a

（注水管之前插入引號和插入符號）

來源

2017-07-19 06:46:35 Magoo

在AWK：

$ awk ' 
NR==FNR{ a[$1]; next }    # hash first file contents to a 
{ 
    if($1 in a)      # if second file item is found in a 
     delete a[$1]    # delete it 
    else b[$1]      # otherwise add it to b hash 
} 
END {        # in the end 
    print "Added:" 
    for(i in b)      # added are in b 
     print i 
    print "Deleted:" 
    for(i in a)      # deleted are in a 
    print i 
}' 2017-07-18.txt 2017-07-19.txt # mind the order 
Added: 
c 
g 
Deleted:

來源

2017-07-19 06:48:41

awk ' 
    # add and remove depending in wich file 
    { A[$1] += (FNR==NR) * 2 - 1 } 

    END { 
     # set in different category depending of count + create human list 
     for(a in A){ T[A[a]] = T[A[a]] "\n " a } 

     # display result (thanks to @CWLiu very nice code) 
     printf "Added: %s\n", (T[1]) ? T[1] : "None" 
     printf "Deleted: %s\n", (T[-1]) ? T[-1] : "None" 
     } 
    ' 2017-07-19.txt 2017-07-18.txt

獲得一點e內存消耗，當它在第一部分中達到0或至少在END部分中的for循環中丟棄時，我們可以刪除A [x]元素

來源

2017-07-19 10:13:40 NeronLeVelu

如何從兩個文件中獲得退休和新增行？

回答

相關問題