2016-03-02 27 views
2

我有一個大的數據庫(database.csv)用如下格式:使用循環AWK之間兩種模式

SOME_ID_NUMBER 
Some delimited columns of data here 
More delimited columns of data here 
Tonsof delimited columns of data here 
######### 
SOME_ID_NUMBER_2 
Other delimited columns of data here 
Cool delimited columns of data here 
Awesome delimited columns of data here 
Extra delimited columns of data here 
######### 
OTHER_ID_NAMES 
Lame delimited columns of data here 
Boring delimited columns of data here 
Okay delimited columns of data here 
######### 

的條目開始的條目名稱,然後幾行(不同數量)分隔符的數據的,然後用線的「#」字符

我也有在另一個文件(patterns.csv)的圖案的大列表終止包含條目,如:

Some_ID_NUMBER 
OTHER_ID_NAMES 
ID_NOT_IN_LIST 

我想從數據庫文件中提取與模式文件中的模式相匹配的條目。這是使用上面的數據所需的示例輸出。

SOME_ID_NUMBER 
Some delimited columns of data here 
More delimited columns of data here 
Tonsof delimited columns of data here 
######### 
OTHER_ID_NAMES 
Lame delimited columns of data here 
Boring delimited columns of data here 
Okay delimited columns of data here 
######### 

或更好的輸出:

SOME_ID_NUMBER Some delimited columns of data here 
SOME_ID_NUMBER More delimited columns of data here 
SOME_ID_NUMBER Tonsof delimited columns of data here 
OTHER_ID_NAMES Lame delimited columns of data here 
OTHER_ID_NAMES Boring delimited columns of data here 
OTHER_ID_NAMES Okay delimited columns of data here 
ID_NOT_IN_LIST 

這裏是我的嘗試:

while read line 
do 
awk -v start="$line" -v last="#" '/^"start"/,/^"last"/' database.csv >>matches.csv 
done<patterns.csv 

回答

2

隨着GNU AWK多焦RS和ENDFILE:

$ cat tst.awk 
NR==FNR { patterns[toupper($0)]; next } 
ENDFILE { RS=ORS="\n#########\n"; FS="\n" } 
toupper($1) in patterns 

$ gawk -f tst.awk patterns.csv database.csv 
SOME_ID_NUMBER 
Some delimited columns of data here 
More delimited columns of data here 
Tonsof delimited columns of data here 
######### 
OTHER_ID_NAMES 
Lame delimited columns of data here 
Boring delimited columns of data here 
Okay delimited columns of data here 
######### 

$ cat tst.awk 
NR==FNR { patterns[toupper($0)]; next } 
ENDFILE { RS="\n#########\n"; FS="\n" } 
toupper($1) in patterns { 
    patterns[$1]++ 
    for (i=2;i<=NF;i++) { 
     print $1, $i 
    } 
} 
END { 
    for (pat in patterns) { 
     if (patterns[pat] == 0) { 
      print pat 
     } 
    } 
} 

$ gawk -f tst.awk patterns.csv database.csv 
SOME_ID_NUMBER Some delimited columns of data here 
SOME_ID_NUMBER More delimited columns of data here 
SOME_ID_NUMBER Tonsof delimited columns of data here 
OTHER_ID_NAMES Lame delimited columns of data here 
OTHER_ID_NAMES Boring delimited columns of data here 
OTHER_ID_NAMES Okay delimited columns of data here 
ID_NOT_IN_LIST 

https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice看看你曾經想寫一個shell循環再只是爲了處理文本。

+0

嗨,謝謝你的迴應。你能解釋一下你的代碼(在tst.awk中)嗎?我是shell腳本編程新手,無法理解。 – cloud7

+0

我很抱歉,但我不認爲通過閱讀大量示例從下往上學習工具/語言是學習它的好方法 - 請閱讀Arnold Robbins編寫的「有效的Awk編程」第4版。跳到我首先使用的語言結構部分(如果有幫助的話),但至少可以瀏覽整本書以獲取語言的主要思想。如果您對劇本有一些具體問題,我會很樂意回答。 –

+1

感謝您的建議!我去做。 – cloud7