2017-06-05 135 views



awk -v RS= -F '<connection name="|<hostPort>' ' 
sub(/".*/, "", $2) 
split($3, tokens, /[:<]/) 
printf "%-6s %s %s\n", $2, tokens[1], tokens[2] 


<connection name="boing_ny__Primary__" transport="tcp"> 
<connection name="boing_ny__Backup__" transport="tcp"> 
<connection name="boy_ny__Primary__" transport="tcp"> 
<connection name="boy_ny__Backup__" transport="tcp"> 
<connection name="song_ny__Primary__" transport="tcp"> 
<connection name="song_ny__Backup__" transport="tcp"> 
<connection name="bob_ny__Primary__" transport="tcp"> 
<connection name="bob_ny__Backup__" transport="tcp"> 


srv2 33333 


boing_ny__Primary__ srv1 33333 
boing_ny__Backup__ srv2 33333 
boy_ny__Primary__  srv1 6666 
boy_ny__Backup__ srv2 6666 
song_ny__Primary__ srv1 55555 
song_ny__Backup__ srv2 55555 
bob_ny__Primary__ srv3 33333 
bob_ny__Backup__ srv4 33333 

使用XML解析器不AWK/SED – anubhava


.UP給你測試/使用/投擲'AWK -v RS = ''「{$ 1 = $ 1 ;匹配($ 0,/ connection name =「([^」] +)。* (。*)/,a); gsub(/:/,「」,a [2]);} length(a [1 ]){print a [1],a [2]}' –


請用散文形容你的小腳本的目的。這將幫助讀者理解代碼(或其中的錯誤),並可能幫助您找到問題。使用各種角度來理解你自己的代碼和它的不當行爲。它類似於https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ – Yunnosch




至於實驗,這個GNU awk似乎是在提供輸入數據的情況下完成這項工作,但不能保證可靠的解決方案,因爲XML數據可能會在您的文件中有所不同。

awk '/connection name=/{a=$0;getline; \ 
print gensub(/(.*connection name=["])(.[^"]*)(["].*)/,"\\2","g",a), \ 
gensub(/(<.*>)(.[^:]*)([:])(.[^<]*)(<[/].*>)/,"\\2 \\4","g",$0)}' file1 

boing_ny__Primary__ srv1 33333 
boing_ny__Backup__ srv2 33333 
boy_ny__Primary__ srv1 6666 
boy_ny__Backup__ srv2 6666 
song_ny__Primary__ srv1 55555 
song_ny__Backup__ srv2 55555 
bob_ny__Primary__ srv3 33333 
bob_ny__Backup__ srv4 33333 

當我們發現含有/connection name=/記錄,我們這個記錄$0存儲到一個變量a,我們得到與getline下一行,然後我們使用和打印兩sed的喜歡使用gensub換人:

gensub(/(.*connection name=["])(.[^"]*)(["].*)/,"\\2","g",a) 
#all chars up to first " --|  |  |  | | | 
#after " and up to the next "------|  |  | | | 
#after last " up to the end of $0 --------|  | | | 
#replace with group 2 ----------------------------| | | 
#global replacement------------------------------------| | 
#target = a = previous record-----------------------------| 

#With a = <connection name="boing_ny__Primary__" transport="tcp"> 
#Above gensub will return group2 = boing_ny__Primary__ 

gensub    (/(<.*>)(.[^:]*)([:])(.[^<]*)(<[/].*>)/,"\\2 \\4","g",$0) 
#all chars between < >--|  |  |  |  |   |  | | 
#all chars up to : -------------|  |  |  |   |  | | 
#literal : ---------------------------|  |  |   |  | | 
#the part after : and before < -------------|  |   |  | | 
#the last < > part ----------------------------------|   |  | | 
#use group 2 and 4 ---------------------------------------------|  | | 
#global replacement ---------------------------------------------------| | 
#target = $0 current record ----------------------------------------------| 

#With $0 = <hostPort>srv2:33333</hostPort> 
#Above gensub will return group 2 = srv2 and group 4 = 33333 --> srv2 33333 

一般AWK gensub synthax是gensub(regexp, replacement, how [, target])和取代一部分返回/施加在gensub功能 - see man page of gensub.


@theuniverseisflat重新更新 - 錯誤的概念之前 - 現在應該沒問題。 –


+ ve很好的解釋:) – RavinderSingh13



awk '/connection/{match($0,/"[^"]*/);VAL=substr($0,RSTART+1,RLENGTH-1);next} /hostPort/ && VAL{match($0,/>.*</);print VAL FS substr($0,RSTART+1,RLENGTH-2)}' Input_file 



awk '/connection/{             #### Looking for a line which has string connection in it. 
         match($0,/"[^"]*/);       #### Using match function here to match a regex where it starts from " and looks for first occurrence of ". 
         VAL=substr($0,RSTART+1,RLENGTH-1);   #### Now creating a variable named VAL whose value is substring of RSTART and LENGTH, where RLENGTH and RSTART are the default keywords of awk and they will be SET when a REGEX match is found. RSTART will give the index of starting point of match and RLENGTH will give the length of that regex match. 
         next           #### Using next keyword to skip all further statements.        
    /hostPort/ && VAL{            #### Checking here 2 conditions, it checks for a line which has hostport string and value of variable VAL is NOT NULL, if these conditions are TRUE then perform following actions. 
         match($0,/>.*</);        #### using match function of awk to get the srv values so putting here regex so match from >.*< get everything between > to <. 
         print VAL FS substr($0,RSTART+1,RLENGTH-2) #### printing value of VAL(which got created in previous condition) then printing the substring of RSTART and RLENGTH values here. 
    ' Input_file              #### Mentioning the Input_file here. 

請使用換行符來提高可讀性,而不需要太多滾動。如果兩個版本之間的唯一區別是評論,那麼爲什麼不刪除第一個(和第二個)版本? – Yunnosch


@Yunnosch:他們之所以被分開是因爲評論版本不會被執行,它只是出於理解的目的,通常我會在給這個答案的時候提供這個答案而忘了提及它。 – RavinderSingh13


啊。我沒有意識到awk沒有評論功能。這當然是一個很好的理由。感謝您的解釋。 – Yunnosch


穩健: cat input | awk -F'\"|>' '{print $2}' | awk -F'<' '{print $1}' | sed -z 's/_\n/_ /g' | grep -v ^srv | tr ":" " "


你可以在命令中減少很多事情。 – RavinderSingh13


plz讓我知道 – moni


awk,sed或grep,他們能夠自己讀取文件,你在這裏完成了UUOC(無用的貓)。作爲一個初步的觀點,我們可以明確地使用awk或sed來做到這一點。 – RavinderSingh13

$ awk -F'[":<>]' '/hostPort/{if (n!="") print n, $3, $4; n=""; next} {n=$3}' file 
boing_ny__Primary__ srv1 33333 
boing_ny__Backup__ srv2 33333 
boy_ny__Primary__ srv1 6666 
boy_ny__Backup__ srv2 6666 
song_ny__Primary__ srv1 55555 
song_ny__Backup__ srv2 55555 
bob_ny__Primary__ srv3 33333 
bob_ny__Backup__ srv4 33333