2016-03-17 95 views
-1

文件我有一個文件從格式化用awk

adaptable adapt:stem<>able:suffix 
addiction addict:stem<>ion:suffix 
adornment adorn:stem<>ment:suffix 
advertisement advertise:stem<>ment:suffix 
aggravation aggravate:stem<>ion:suffix 
aggregation aggregate:stem<>ion:suffix 
agreeable agree:stem<>able:suffix 

以下我需要將其轉換爲以下形式

(adaptable ((adapt:stem)able:suffix)) 
(addiction ((addict:stem)ion:suffix)) 
(adornment ((adorn:stem)ment:suffix)) 
(advertisement ((advertise:stem)ment:suffix)) 
(aggravation ((aggravate:stem)ion:suffix))) 
(aggregation (aggregate:stem)ion:suffix)) 
(agreeable ((agree:stem)able:suffix)) 
where most complex ones are 
(imperialistic (((imperialism:stem)ist:suffix)ic:suffix)) 

我試着用awk來做到這一點。 這是代碼,我使用awk '{print $0")"}' restof120.txt by executing the command it added)`在所有行的末尾。

awk '{print "("$0")"}' 

我的問題是有沒有辦法自動轉換格式?使用任何軟件包。

有複雜的情況下 例子

indecipherable in:prefix<>decipher:stem<>able:suffix 
(indecipherable (((in:prefix)decipher:stem)able:suffix)) 

更新:一些模式,我見過

inactive in:prefix<>active:stem 
    (inactive ((in:prefix)active:stem)) 
+3

您的輸入與您的輸出不匹配 - 輸出中有4條額外的行。 –

+0

什麼是_altruistic_的輸入,這是最複雜的情​​況;還有比這更復雜嗎? –

+0

有帝國主義的帝國主義:幹)ist:後綴ic:後綴 – Karun

回答

1

這可能是你在找什麼:

$ cat tst.awk 
{ 
    n = gsub(/<>|$/,")",$2) 
    s = sprintf("%*s",n,"") 
    gsub(/ /,"(",s) 
    print "(" $1, s $2 ")" 
} 

$ awk -f tst.awk file 
(adaptable ((adapt:stem)able:suffix)) 
(addiction ((addict:stem)ion:suffix)) 
(adornment ((adorn:stem)ment:suffix)) 
(advertisement ((advertise:stem)ment:suffix)) 
(aggravation ((aggravate:stem)ion:suffix)) 
(aggregation ((aggregate:stem)ion:suffix)) 
(agreeable ((agree:stem)able:suffix)) 
(indecipherable (((in:prefix)decipher:stem)able:suffix)) 
+0

括號中的數字必須匹配 – Karun

+0

您是否看到他們沒有的行? –

2

在與複雜的情況下編輯,我想修改我的sed命令的使用循環:

sed -r -e ':loop' -e 's/([^ ]+)<>/(\1)/' -e 't loop' -e 's/(.*)(.*)/(\1 (\2))/' 

它將從右側取代進行金正日替換無法匹配任何東西,所以更換爲「難以辨認」的測試用例會去如下:

indecipherable in:prefix<>decipher:stem<>able:suffix  # original text 
indecipherable (in:prefix<>decipher:stem)able:suffix  # after 1st iteration 
indecipherable ((in:prefix)decipher:stem)able:suffix  # after 2nd iteration 
(indecipherable (((in:prefix)decipher:stem)able:suffix)) # after loop: add the outer parentheses 

試運行:

$ echo """adaptable adapt:stem<>able:suffix 
addiction addict:stem<>ion:suffix 
adornment adorn:stem<>ment:suffix 
advertisement advertise:stem<>ment:suffix 
aggravation aggravate:stem<>ion:suffix 
aggregation aggregate:stem<>ion:suffix 
agreeable agree:stem<>able:suffix 
indecipherable in:prefix<>decipher:stem<>able:suffix""" | sed -r -e ':loop' -e 's/([^ ]+)<>/(\1)/' -e 't loop' -e 's/(.*)(.*)/(\1 (\2))/' 
(adaptable ((adapt:stem)able:suffix)) 
(addiction ((addict:stem)ion:suffix)) 
(adornment ((adorn:stem)ment:suffix)) 
(advertisement ((advertise:stem)ment:suffix)) 
(aggravation ((aggravate:stem)ion:suffix)) 
(aggregation ((aggregate:stem)ion:suffix)) 
(agreeable ((agree:stem)able:suffix)) 
(indecipherable (((in:prefix)decipher:stem)able:suffix)) 

我會用以下sed命令:

sed -r 's/(\w+) (\w+:stem)<>(\w+:suffix)/(\1 ((\2)\3))/' 

例子:

$ echo """adaptable adapt:stem<>able:suffix 
addiction addict:stem<>ion:suffix 
adornment adorn:stem<>ment:suffix 
advertisement advertise:stem<>ment:suffix 
aggravation aggravate:stem<>ion:suffix 
aggregation aggregate:stem<>ion:suffix 
agreeable agree:stem<>able:suffix""" | sed -r 's/(\w+) (\w+:stem)<>(\w+:suffix)/(\1 ((\2)\3))/' 
(adaptable ((adapt:stem)able:suffix)) 
(addiction ((addict:stem)ion:suffix)) 
(adornment ((adorn:stem)ment:suffix)) 
(advertisement ((advertise:stem)ment:suffix)) 
(aggravation ((aggravate:stem)ion:suffix)) 
(aggregation ((aggregate:stem)ion:suffix)) 
(agreeable ((agree:stem)able:suffix)) 
+0

對於複雜的情況,循環效果很好。因爲您在最後一個單詞周圍缺少一組括號,因此請更改爲'/(。*)(。*)/(\ 1(\ 2))/'作爲最終命令。 「內部」替換可以簡化爲's /([^] +)<> /(\ 1)/' –

+0

@glennjackman謝謝,編輯! – Aaron

2

awk來救援!

$ awk -F'[ <>]' '{print "(" $1, "((" $2 ")" $4 "))" }' file 

(adaptable ((adapt:stem)able:suffix)) 
(addiction ((addict:stem)ion:suffix)) 
(adornment ((adorn:stem)ment:suffix)) 
(advertisement ((advertise:stem)ment:suffix)) 
(aggravation ((aggravate:stem)ion:suffix)) 
(aggregation ((aggregate:stem)ion:suffix)) 
(agreeable ((agree:stem)able:suffix)) 

的額外情況下,最好是委託給一個函數,而不是手動把括號

$ awk -F'[ <>]' 'function wrap(a) {return "("a")"}; 
     {w=wrap(wrap($2)$4)} 
    NF>5{w=wrap(w$6)} 
     {print wrap($1" "w)}' file_with_complex_case 

(adaptable ((adapt:stem)able:suffix)) 
(addiction ((addict:stem)ion:suffix)) 
(adornment ((adorn:stem)ment:suffix)) 
(advertisement ((advertise:stem)ment:suffix)) 
(aggravation (((aggravate:stem)ion:suffix))) 
(aggregation (((aggregate:stem)ion:suffix))) 
(agreeable (((agree:stem)able:suffix))) 
(indecipherable (((in:prefix)decipher:stem)able:suffix)) 
+0

這對於簡單的人來說是完美的工作,但在複雜的情況下失敗了自戀自戀:詞幹<> ist:後綴<> ic:後綴 – Karun

+0

請在原文中更新這種情況。我沒有看到它,所以無法驗證。 – karakfa

+0

我已經註明日期複雜的情況 – Karun

1

嘗試此:

awk -F ' |<>' '{ 
    parts = "" 
    for (i=2; i<=NF; i++) parts = "(" parts $i ")" 
    print "(" $1, parts ")" 
}' <<END 
adaptable adapt:stem<>able:suffix 
indecipherable in:prefix<>decipher:stem<>able:suffix 
END 
(adaptable ((adapt:stem)able:suffix)) 
(indecipherable (((in:prefix)decipher:stem)able:suffix)) 

它使用空格或字符串<>作爲字段分隔符(可能需要GNU awk)。它累積的部分包裹在括號內。

+2

它不需要GNU awk,所有的awk都支持FS的RE,它只是當你需要一個需要gawk的RS的RE時。 –