IP地址清理日誌

我在使用「基本？」時很不好。 unix命令，這個問題使我的知識更加考驗。我想要做的是從一個日誌（例如來自apache的access.log）對所有IP地址進行grep並計算它們發生的頻率。我可以用一個命令來做到這一點，還是我需要爲此編寫一個腳本？IP地址清理日誌

BR，保羅Peelen

來源

2011-04-20 Paul Peelen

看一看我在UNIX stackexchange答案：https://unix.stackexchange.com/a/389565/249079 – Ganapathy 2017-09-29 05:36:56

您至少需要一條短管道。

sed -e 's/\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/' -e t -e d access.log | sort | uniq -c

哪個會打印每個IP（只能用於ipv4），排序前加count。我使用apache2的access.log測試了它（雖然它是可配置的，所以你需要檢查），並且它對我很有用。它假定IP地址是每行中的第一件事。

sed收集IP地址（實際上它尋找4組數字，其間有句點），並用它替換整個行。 -e t如果設法替換，則繼續下一行，-e d刪除該行（如果其上沒有IP地址）。 sort sorts :) :)和uniq -c計數連續相同行的實例（這是因爲我們已經對它們進行了排序，對應於總計數）。

來源

2011-04-20 18:28:28 falstro

，你可以做以下（其中數據文件是日誌文件的名稱）

egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' datafile | sort | uniq -c

編輯：錯過了如何計算地址的一部分，現在又增加了

來源

2011-04-20 18:27:36

這會失敗，因爲egrep會打印整行包括時間戳，並且每行都是唯一的，所以您需要單獨輸出IP地址並刪除其餘行（或者在檢查唯一性時僅考慮IP） – falstro 2014-01-09 07:35:47

-1

使用的sed：

$ sed 's/.*\(<regex_for_ip_address>\).*/\1/' <filename> | sort | uniq -c

您可以搜索並找到可用的正則表達式上Inernet IP地址和<regex_for_ip_address>更換。例如From answers to a related question on stackoverflow

來源

2011-04-20 18:55:30 sahaj

正如Dave Tarsi指出的那樣，這可能會失敗，它會捕獲諸如有效IP地址的瀏覽器版本。您需要知道IP地址在哪一行（開始），並且只能選擇這些行。 – falstro 2014-01-09 07:32:28

egrep'[[：digit：]] {1,3}（。[[：digit：]] {1,3}）{3}'| awk'{print $ 1}'| sort | uniq -c

來源

2013-11-04 05:53:52 Snowwolf

Dave Tarsi指出，這實際上可能會失敗，它會捕獲諸如瀏覽器版本等有效IP地址的東西。您需要知道IP地址在哪一行（開始），並且只能選擇這些行。 – falstro 2014-01-09 07:33:01

以下是我幾年前寫的一個腳本。它從Apache訪問日誌中尋找地址。我剛剛嘗試運行Ubuntu 11.10（oneiric）3.0.0-32-generic＃51-Ubuntu SMP Thu Mar 21 15:51:26 UTC 2013 i686 i686 i386 GNU/Linux 它工作正常。使用Gvim或Vim讀取結果文件，這將被稱爲unique_visits，它將在列中列出唯一的ips。這個關鍵在於grep使用的行。這些表達式用於提取IP地址號碼。僅限IPV4。您可能需要瀏覽並更新瀏覽器版本號。我寫了Slackware的系統的另一個類似的腳本是在這裏： http://www.perpetualpc.net/srtd_bkmrk.html

#!/bin/sh 
#eliminate search engine referals and zombie hunters. combined_log is the original file 
egrep '(google)|(yahoo)|(mamma)|(query)|(msn)|(ask.com)|(search)|(altavista)|(images.google)|(xb1)|(cmd.exe)|(trexmod)|(robots.txt)|(copernic.com)|(POST)' combined_log > search 
#now sort them to eliminate duplicates and put them in order 
sort -un search > search_sort 
#do the same with original file 
sort -un combined_log > combined_log_sort 
#now get all the ip addresses. only the numbers 
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' search_sort > search_sort_ip 
grep -o '[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' combined_log_sort > combined_log_sort_ip 
sdiff -s combined_log_sort_ip search_sort_ip > final_result_ip 
#get rid of the extra column 
grep -o '^\|[0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*[.][0-9][0-9]*' final_result_ip > bookmarked_ip 
#remove stuff like browser versions and system versions 
egrep -v '(4.4.2.0)|(1.6.3.1)|(0.9.2.1)|(4.0.0.42)|(4.1.8.0)|(1.305.2.109)|(1.305.2.12)|(0.0.43.45)|(5.0.0.0)|(1.6.2.0)|(4.4.5.0)|(1.305.2.137)|(4.3.5.0)|(1.2.0.7)|(4.1.5.0)|(5.0.2.6)|(4.4.9.0)|(6.1.0.1)|(4.4.9.0)|(5.0.8.6)|(5.0.2.4)|(4.4.8.0)|(4.4.6.0)' bookmarked_ip > unique_visits 

exit 0

來源

2013-11-17 16:41:35

-1

cat access.log |egrep -o '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' |uniq -c|sort

來源

2014-03-26 17:47:01 cint

IP地址清理日誌

回答

相關問題