2015-05-21 27 views
0

這是我的Apache訪問日誌文件。我希望Apache的訪問日誌uniq計數的網址。如何用awk過濾Apache的訪問日誌或sed中或削減

"2011-09-07 17:00:00" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/" 
"2011-09-07 17:00:17" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:21" "GET /abc/index.php/contentapi/discontent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:00" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:00" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:16" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:29" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:22" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:38" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:44" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:33" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:04" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:06" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:14" "GET /abc/index.php/data/dataContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http 
"2011-09-07 17:00:51" "GET /abc/index.php/Api/ApiContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:33" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:45" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:59" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:02:00" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:02:09" "GET /abc/index.php/site/siteContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:00" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 
"2011-09-07 17:00:09" "GET /abc/index.php/htmlrequest/htmlContent/4fd590d1762eb/ALL/allowed/1/all/all/1/http/ 

上面的文件我給了一個例子。日誌文件不斷增長。
預計輸出

/abc/index.php/contentapi/discontent/ - 3 
/abc/index.php/data/dataContent/ - 3 
/abc/index.php/Api/ApiContent/ - 5 
/abc/index.php/site/siteContent/ - 6 
/abc/index.php/htmlrequest/htmlContent/ - 5 
+2

什麼你已經嘗試失敗了嗎? – NeronLeVelu

回答

1

我覺得可能已經在Apache日誌一些錯別字,但這個怎麼樣:

$ grep -o 'abc/[^ 0-9]*/' apache.log | sort | uniq -c | sort -r 
6 abc/index.php/site/siteContent/ 
5 abc/index.php/htmlrequest/htmlContent/ 
5 abc/index.php/Api/ApiContent/ 
3 abc/index.php/data/dataContent/ 
2 abc/index.php/contentapi/discontent/ 
1 abc/index.php/contentapi/ 
0

這提取假定它是URL

第四場

cat logfile | awk -F' ' '{print $4}' | awk -F'/' '{print $2"/"$3"/"$4"/"$5}' | sort | uniq -c

+1

只要你看到一個以'cat file'開頭的解決方案,你就知道這個海報並不知道shell。任何時候你看到一個包含'awk ... |的解決方案awk ......你知道海報不知道awk。每當你看到一個解決方案使用相同的硬編碼字符串(例如'「/」')多次而不是','使用'OFS'來分隔輸出字段時,你又會知道海報不知道awk。任何時候你看到一個包含'sort |的解決方案uniq'你再次知道海報不知道殼。所以我會警惕這個解決方案,因爲它包含許多常見的紅旗。 –

+0

然後你就會知道有數百種方法來解決這樣的簡單問題。重點在於試圖在「保持簡單愚蠢」和準確之間保持平衡,以便在所有情況下都能正常工作。如果用戶要求這樣的命令,保持簡單可能是最好的 –

0

隨着GNU AWK爲gensub():

$ awk '{cnt[gensub(/(([/][^/]+){4}[/]).*/,"\\1","",$4)]++} END{for (url in cnt) print url " - " cnt[url]}' file 
/abc/index.php/contentapi/discontent/ - 3 
/abc/index.php/data/dataContent/ - 3 
/abc/index.php/site/siteContent/ - 6 
/abc/index.php/Api/ApiContent/ - 5 
/abc/index.php/htmlrequest/htmlContent/ - 5