2012-08-08 108 views
1

文件我有根據設定值以下數據拆分基於一個標準

.domain bag 
.set bag1 
bag1 
abc1 
.set bag2 
bag2 
abc2 
.domain cat 
.set bag1:cat 
bag1:cat 
abc1:cat 
.set bag2:cat 
bag2:cat 
abc2:cat 

我想將這個文件分成兩個(bag1.txt和bag2.txt)的文件。

bag1.txt應該是這樣的:

.domain bag 
.set bag1 
bag1 
abc1 
.domain cat 
.set bag1:cat 
bag1:cat 
abc1:cat 

bag2.txt應該是這樣的:

.domain bag 
.set bag2 
bag2 
abc2 
.domain cat 
.set bag2:cat 
bag2:cat 
abc2:cat 

。域線是這兩個文件常見。

我試過下面的命令,但它不工作。

nawk '{if($0~/.set/){split($2,a,":");filename=a[1]".text"}if(filename=".text"){print|"tee *.text"}else{print >filename}}' file.txt 

回答

3

方式一:

awk ' 
    BEGIN { 
     ## Split fields with spaces and colon. 
     FS = "[ :]+"; 

     ## Extension of output files. 
     ext = ".txt"; 
    } 

    ## Write lines that begin with ".domain" to all known output files (saved 
    ## in "processed_bags"). Also save them in the "domain" array to copy them 
    ## later to all files not processed yet. 
    $1 == ".domain" { 

     for (b in processed_bags) { 
      print $0 >> sprintf("%s%s", b, ext); 
     } 

     domain[ i++ ] = $0; 

     next; 
    } 

    ## Select output file to write. If not found previously, copy all 
    ## domains saved until now. 
    $1 == ".set" { 
     bag = $2; 
     if (! (bag in processed_bags)) { 
      for (j = 0; j < i; j++) { 
       print domain[j] >> sprintf("%s%s", bag, ext); 
      } 
      processed_bags[ bag ] = 1;    
     } 
    } 

    ## A normal line of data (neither ".domain" nor ".set"). Copy 
    ## to the file saved in "bag" variable. 
    bag { 
     print $0 >> sprintf("%s%s", bag, ext); 
    } 
' file.txt 

運行下面的命令來查看輸出:

head bag[12].txt 

輸出:

==> bag1.txt <==                                                        
.domain bag                                                         
.set bag1                                                          
bag1                                                           
abc1                                                           
.domain cat                                                         
.set bag1:cat                                                         
bag1:cat 
abc1:cat 

==> bag2.txt <== 
.domain bag 
.set bag2 
bag2 
abc2 
.domain cat 
.set bag2:cat 
bag2:cat 
abc2:cat 
+0

這是ok.But我們可以概括的一部分共同線?如果有很多包? like bag1 .... bag1000.how我可以做到這一點嗎?我有很多從bag1到bag1000的文件,而不是print >> bag1,我們可以簡單地用print> * .txt(很多空文件是已經出現在從bag1.txt到bag.txt的目錄中) – Vijay 2012-08-08 12:06:30

+0

@peter:我已經編輯了答案來概括它。這是完全評論,你可以看到它是否符合你的需求,因爲我不明白你的意思是什麼'print >> * .txt' – Birei 2012-08-08 13:41:08