我應該如何統計每個文件中的重複行數？

我已經試過這樣：我應該如何統計每個文件中的重複行數？

dirs=$1 
for dir in $dirs 
do 
     ls -R $dir    
done

來源

2017-05-04 Bianca Chiorean

對於導演中的多個文件，重複項可能包含在目錄中的許多文件中，或者不同文件中的重複行數是否重複？另外你認爲是重複的？一些假裝數據和你想要的數量在這裏會有很大的幫助。 – JNevill

實際上，包含在目錄中的許多文件中的重複行的格言 –

重複的行意味着重複的行 –

使用排序與uniq的找到重複的行。

#!/bin/bash 
dirs=("[email protected]") 
for dir in "${dirs[@]}" ; do 
    cat "$dir"/* 
done | sort | uniq -c | sort -n | tail -n1

uniq -c將出現次數預先考慮到每個線路
sort -n將由出現
tail -n1的數的行進行排序將只輸出的最後一行，即最大值。如果你想看到所有具有相同數量的重複的線條，添加以下的，而不是尾巴：
```
perl -ane 'if ($F[0] == $n) { push @buff, $_ } 
      else { @buff = $_ } 
      $n = $F[0]; 
      END { print for @buff }' 
```

來源

2017-05-04 13:35:01 choroba

像這個？：

$ cat > foo 
this 
nope 
$ cat > bar 
neither 
this 

$ sort *|uniq -c 
    1 neither 
    1 nope 
    2 this

並剔除與只是那些1s：

... | awk '$1>1' 
     2 this

來源

2017-05-04 13:41:08

您可以使用awk。如果你只是想「統計重複行數」，我們可以推斷出你在「在同一個文件中出現過的所有行」之後。下面會產生這些罪狀：

#!/bin/sh 

for file in "[email protected]"; do 
    if [ -s "$file" ]; then 
    awk '$0 in a {c++} {a[$0]} END {printf "%s: %d\n", FILENAME, c}' "$file" 
    fi 
done

的awk腳本首先檢查當前行存儲在陣列a中，如果確實如此，計數器加一。然後它將該行添加到其數組中。在文件末尾，我們打印總數。

請注意，這可能在非常大的文件上有問題，因爲整個輸入文件需要讀入數組的內存中。

實施例：

$ printf 'foo\nbar\nthis\nbar\nthat\nbar\n' > inp.txt 
$ awk '$0 in a {c++} {a[$0]} END {printf "%s: %d\n", FILENAME, c}' inp.txt 
inp.txt: 2

這個詞「酒吧」存在文件中的三次，從而有兩個重複。

匯聚多個文件，你可以養活多個文件AWK：

$ printf 'foo\nbar\nthis\nbar\n' > inp1.txt 
$ printf 'red\nblue\ngreen\nbar\n' > inp2.txt 
$ awk '$0 in a {c++} {a[$0]} END {print c}' inp1.txt inp2.txt 
2

爲此，字「棒」在第二個文件中的第一個文件中出現兩次，一次 - 共三次，因此我們仍然有兩個重複。

來源

2017-05-04 13:59:31 ghoti

我應該如何統計每個文件中的重複行數？

回答

相關問題