2012-12-27 24 views
0

我想在「ctr {words}」之間打印單詞並計算文件中的相同單詞。在awk中需要代碼unix或使用substr

我想:

sed -n 's/.*ctr{\(.[^}]*\).*/\1/p' file 

,但它沒有搜索所有的字,一個詞

的文件是:

962796057604|mar0101|0|00000107A20E00000A6C331650B920340C00|0|0|400019FD7DBFBF7F|1001|962796057604|0 |01001|||-1|795971936| 00962795971936|16||-1| 00962795971936|-1|0|2|0|416019000659493|0||||||0|0|2012.12.01 00:07:09|12|30|0|516|16|1|2012.12.01 00:06:39|1|0||202|20001||0B12F1001104697209100300000000000000|1|1|11000|0|0||0881006972091003F000||0 714F610045584E6|000000000000|3|1|0000000000000000|0|140|0|0|0|0|0|0|||0|2|||||||||||||||||||||0|||0| |0|1|143|acf{0}cif{0}fcf{0}con{0}cuf{0}ctr{**Mo7afazat**}cgpa{962796057604}vlr{0096279001300}cff{0}roaf{0}mpty{0}ftksn{JMT}ftksr{0001}ftktp{CallTicketCPOCS} || 
1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-962796298894|mar0101|0|000001028225AE4AD868A8B750B900980C00|1|0|4000018001000002||962796298894|||||-1|||||-1||-1|0||||0||||||-1|-1|||-1|0|-1|-1|-1|2012.12.01 00:08:35|1|0||-1|0|||||||||||||0|0|||3797|0|12|-2147483648|-2147483648|-2147483648|-2147483648|||||||||||||||||||||||||0|||0||1|6|244|tid{111210532409329884}pfid{20}gob{1}rid{globitel} afid{}uid1{962796298894}aid1{1}ar1{0}uid2{globitel}aid2{-1}pid{1234}pur{!GDRC COMMIT AMOUNT 0}ratinf{}rec{0}rots{0}tda{}mid{}exd{0}reqa{0}ctr{**JaishanaIN**}ftksn{JMT}ftksr{0001}ftktp{PayCallTicket}|| 
1|34|2012.12.01 00:08:35|12|4|100-50-0-962796605155|mar0101|0|00000102A20400000A6A439D50B920520C00|0|0|400019FD7DBFBF7F|1001|962796605155|1 6||||-1|b116c||16||-1||-1|0|0|0|416017002233360|0||||||0|0|1970.01.01 02:00:00|0|0|0|220|0|1|1970.01.01 02:00:00|1|0||194|0||000000000000000000000000000000000000|0|0||0|0||00000000000000000000||0000000000 000000|000000000000|0|0|0000000000000000|0|370|0|0|0|0|0|0|||0|0|||||||||||||||||||||0|||0||0|1|70|a cf{3}ussd{1}ctr{**ZainElKul**}ftksn{JMT}ftksr{0001}ftktp{CallTicketCPOCS}|| 
1|34|2012.12.01 00:08:35|12|4|100-10-0 
1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-962797611253|mar0101|0|0000010282B54BD015FF4C4B50B8F96E0C00|1|0|4000018001000002||962797611253|||||-1|||||-1||-1|0||||0||||||-1|-1|||-1|0|-1|-1|-1|2012.12.01 00:08:35|1|0||-1|0|||||||||||||0|0|||885|0|12|-2147483648|-2147483648|-2147483648|-2147483648|||||||||||||||||||||||||0|||0||1|6|243|tid{111220371293561120}pfid{20}gob{1}rid{globitel} afid{}uid1{962797611253}aid1{1}ar1{0}uid2{globitel}aid2{-1}pid{1234}pur{!GDRC COMMIT AMOUNT 0}ratinf{}rec{0}rots{0}tda{}mid{}exd{0}reqa{0}ctr{**ZainElKul**}ftksn{JMT}ftksr{0001}ftktp{PayCallTicket}|| 

-962795292027|mar0101|0|00000101A20200000A6A96B750B920300C00|0|0|400019FD7DBFBF7F|1001|962795292027|0 |01004|||-1|797196452| 00962797196452|16||-1| 00962797196452|-1|0|2|0|416018002276781|0||||||0|0|2012.12.01 00:07:09|12|12|23|516|16|1|2012.12.01 00:06:34|1|0||202|1||0B12F1001104697209100300000000000000|1|1|11000|0|0||0881006972091003F000||0714F 6100455AD67|000000000000|3|1|0000000000000000|0|30|0|0|0|0|0|0|||0|0|||||||||||||||||||||0|||0||0|1| 171|acf{0}cif{0}fcf{0}con{0}cuf{0}ctr{ZainUnlimited}cgpa{962795292027}vlr{0096279001300}cff{0}roaf{0}mpty{0}cacc{1;0;30}cquo{1;230;}ftksn{JMT}ftksr{000 1}ftktp{CallTicketCPOCS}|| 
1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-962796012818|mar0101|0|0000010882218115085D5F9150B920520C00|0|0|4000018001000002||962796012818|||||-1|||||-1||-1|0||||0||||||-1|-1|||-1|0|-1|-1|-1|2012.12.01 00:08:35|1|0||-1|1|||||||||||||0|0|||70|0|0|-2147483648|-2147483648|-2147483648|-2147483648|||||||||||||||||||||||||0|||0||1|6|258|tid{111221366974701289}pfid{17}gob{1}rid{globitel} afid{}uid1{962796012818}aid1{1}ar1{-2147483648}uid2{}aid2{-1}pid{DEFAULT_DECISION}pur{!GDRC Balance Check}ratinf{}rec{0}rots{0}tda{}mid{}exd{0}reqa{0}ctr{**AlBarakehNew**}ftksn{JMT}ftksr{0001}ftktp{PayCallTicket}|| 
1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-962797251349|mar0101|0|0000010282A451483EDFCFD350B920400C00|1|0|4000018001000002||962797251349|||||-1|||||-1||-1|0||||0||||||-1|-1|||-1|0|-1|-1|-1|2012.12.01 00:08:35|1|0||-1|0|||||||||||||0|0|||440|0|12|-2147483648|-2147483648|-2147483648|-2147483648|||||||||||||||||||||||||0|||0||1|6|245|tid{111211342745325133}pfid{20}gob{1}rid{globitel} afid{}uid1{962797251349}aid1{1}ar1{0}uid2{globitel}aid2{-1}pid{1234}pur{!GDRC COMMIT AMOUNT 0}ratinf{}rec{0}rots{0}tda{}mid{}exd{0}reqa{0}ctr{**ZainElKulSN**}ftksn{JMT}ftksr{0001}ftktp{PayCallTicket}|| 
1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000- 
+0

請發表您的實際和預期的輸出。你的命令似乎對我很好。 – dogbane

+0

Mo7afazat 1 JaishanaIN 1 ZainElKul 2 AlBarakehNew 1 ZainElKulSN – eyadgh

回答

1

它看起來像你缺少計數。要做到這一點,最簡單的辦法就是管通過uniq -c您的輸出:

$ sed -n 's/.*ctr{\(.[^}]*\).*/\1/p' file | sort | uniq -c 
     1 **Mo7afazat** 
     1 **JaishanaIN** 
     2 **ZainElKul** 
     1 ZainUnlimited 
     1 **AlBarakehNew** 
     1 **ZainElKulSN** 

的另一種方法,只用awk

$ awk 'match($0,".*ctr{([^}]*)}.*",m){a[m[1]]++}END{for(i in a) print i,a[i]}' file 
ZainUnlimited 1 
**ZainElKulSN** 1 
**Mo7afazat** 1 
**ZainElKul** 2 
**JaishanaIN** 1 
**AlBarakehNew** 1 
+0

我的朋友你的代碼的工作,但我怎麼貓算多少 「mo7afzat」 或有多少 「zainlkul」 ....等 – eyadgh

+2

只是知道這就是GNU awk只能由第三個arg來匹配()。另外,將'print i'「a [i]'更改爲'print i,a [i]' - 這就是OFS的用途。 –

+0

@ ed-morton完成,謝謝 – dogbane

0

當文件grep尋找比賽是最好的選擇往往不是。

使用grep與陽性前瞻和uniq -c

$ grep -Po "(?<=ctr{)[^}]+" file | uniq -c 
1 Mo7afazat 
1 JaishanaIN 
2 ZainElKul 
1 ZainUnlimited 
1 AlBarakehNew 
1 ZainElKulSN 

man uniq

注: 'uniq的' 不檢測重複行,除非它們是相鄰的。

對於那些重複的不相鄰管sort但首先在每場比賽在一部開拓創新的文件中找到的順序將丟失的文件:

grep -Po "(?<=ctr{)[^}]+" file | sort | uniq -c 
1 AlBarakehNew 
1 JaishanaIN 
1 Mo7afazat 
2 ZainElKul 
1 ZainElKulSN 
1 ZainUnlimited 
+0

作爲例子,如果我們在上面的文件中有三個單詞「zainlkul」和十個單詞「mo7afazat」在輸出我想打印:zainlkul 3 mo7afzat 10 ......等 – eyadgh

+0

@eyadgh是這是這段代碼的作用。 –