使用shell查找列中每個項目的頻率

我對shell/mac終端非常缺乏經驗，所以任何幫助或建議都將不勝感激。使用shell查找列中每個項目的頻率

我有一個非常大的一組數據與製表符分隔符。這是一個代碼的例子。

0001 User1 Tweet1 
0002 User2 Tweet2 
0003 User3 Tweet3 
0004 User2 Tweet4 
0005 User2 Tweet5

我一直試圖導出爲CSV每個唯一用戶的列表和他們有多少次出現/做出鳴叫。

這裏是我的代碼當前的嘗試：

cut -f 2 Twitter_Data_1 |sort | uniq -c | wc -l > TweetFreq.csv

理想我想導出CSV看起來像：

User1 1 
User2 3 
User3 1

來源

2017-10-20 Liam

你ALRE ady用'uniq'進行計數。 'wc'的目的是什麼？ –

好點，但即使我刪除它，我只得到1輸出，而不是整列 – Liam

更新您的問題，以顯示您當前的代碼和輸出 –

$ awk -F '\t' '{ print $2 }' tweet | sort | uniq -c

輸出：

1 User1 
    3 User2 
    1 User3

來源

2017-10-20 12:20:17 mathB

由於我的第三列與我的例子不一樣，我怎樣才能替換tweet這個詞？ – Liam

'tweet'是文件名，在你的情況下，它應該是'Twitter_Data_1' – mathB

不乾淨的，但它的工作原理

#!/bin/bash 
mkdir tmptweet # Creation of the temp directory 
while read line; do 
user=`echo $line | cut -d " " -f 2` # we access the username 
echo $line >> tmptweet/$user # add a line to the selected user's counter 
done < Twitter_Data_1 

for file in tmptweet/*; do 
i=`cat $file | wc -l` # we check the lines for each user ... 
echo "${file##*/} $i" >> TweetFreq.csv # ... and put this into the final file 
done 
rm -rf tmptweet # remove of the temp directory

臨時文件臨時目錄用於存儲值，比與雜耍更容易3210。

您Twitter_Data_1的每一行插入到用戶名命名的文件，然後一個計數線的每個這些文件的數量來創建TweetFreq.csv文件

測試：

Will /home/will # ls 
script.sh  Twitter_Data_1 
Will /home/will # ./script.sh 
Will /home/will # ls 
script.sh  Twitter_Data_1  TweetFreq.csv 
Will /home/will # cat TweetFreq.csv 
User1  1 
User2  3 
User3  1 
Will /home/will #

來源

2017-10-20 12:07:00 Will

現在運行代碼，它需要很長時間。大量的數據和它的Mac ...將讓你張貼 – Liam

你可以檢查文件正在'tmptweet'使用'cd'創建，並且還可以使用'tail -f TweetFreq.csv'來查看實時饋送第二部分:-) – Will

tweetfreq.cv並未爲我創建，所以我停止了代碼 – Liam

使用shell查找列中每個項目的頻率

回答

相關問題