2014-06-15 54 views
0

可以使用awk僅提取第3列中具有不同 個字符的連續行(在本例中爲20)(從C到H13 )並且具有列5相同形成文件 ,結構如下:使用awk只提取在列中具有不同字符且具有相同的另一列的連續行

.............................. ..........................................

LINE 564 C LESS L3782  246.617 200.380 10.086 1.00 0.00  L  
LINE 565 C1 LESS L3782  247.525 201.163 9.136 1.00 0.00  L  
LINE 566 C2 LESS L3782  247.265 202.663 9.269 1.00 0.00  L  
LINE 567 C3 LESS L3782  249.012 200.776 9.298 1.00 0.00  L  
LINE 568 C4 LESS L3782  249.659 201.089 10.654 1.00 0.00  L  
LINE 569 C5 LESS L3782  251.029 200.429 10.766 1.00 0.00  L  
LINE 570 O LESS L3782  249.832 202.495 10.789 1.00 0.00  L  
LINE 571 H LESS L3782  246.797 199.303 9.997 1.00 0.00  L  
LINE 572 H1 LESS L3782  246.772 200.668 11.130 1.00 0.00  L   
LINE 592 C LESS L3818  134.617 208.380 10.086 1.00 0.00  L  
LINE 593 C1 LESS L3818  135.525 209.163 9.136 1.00 0.00  L  
LINE 594 C2 LESS L3818  135.265 210.663 9.269 1.00 0.00  L  
LINE 595 C3 LESS L3818  137.012 208.776 9.298 1.00 0.00  L  
LINE 596 C4 LESS L3818  137.659 209.089 10.654 1.00 0.00  L  
LINE 597 C5 LESS L3818  139.029 208.429 10.766 1.00 0.00  L  
LINE 598 O LESS L3818  137.832 210.495 10.789 1.00 0.00  L  
LINE 599 H LESS L3818  134.797 207.303 9.997 1.00 0.00  L  
LINE 600 H1 LESS L3818  134.772 208.668 11.130 1.00 0.00  L  
LINE 601 H2 LESS L3818  133.564 208.562 9.845 1.00 0.00  L  
LINE 602 H3 LESS L3818  135.242 208.879 8.114 1.00 0.00  L  
LINE 603 H4 LESS L3818  135.381 211.008 10.301 1.00 0.00  L  
LINE 604 H5 LESS L3818  134.241 210.901 8.961 1.00 0.00  L  
LINE 605 H6 LESS L3818  135.946 211.237 8.632 1.00 0.00  L  
LINE 606 H7 LESS L3818  137.579 209.288 8.508 1.00 0.00  L  
LINE 607 H8 LESS L3818  137.099 207.700 9.100 1.00 0.00  L  
LINE 608 H9 LESS L3818  137.027 208.740 11.477 1.00 0.00  L  
LINE 609 H10 LESS L3818  138.225 210.662 11.662 1.00 0.00  L  
LINE 610 H11 LESS L3818  139.496 208.674 11.726 1.00 0.00  L  
LINE 611 H12 LESS L3818  138.955 207.340 10.685 1.00 0.00  L  
LINE 612 H13 LESS L3818  139.705 208.795 9.985 1.00 0.00  L   
LINE 618 C5 LESS L3832  251.029 208.429 10.766 1.00 0.00  L  
LINE 619 O LESS L3832  249.832 210.495 10.789 1.00 0.00  L  
LINE 620 H LESS L3832  246.797 207.303 9.997 1.00 0.00  L  
LINE 621 H1 LESS L3832  246.772 208.668 11.130 1.00 0.00  L  
LINE 622 H2 LESS L3832  245.564 208.562 9.845 1.00 0.00  L  
LINE 626 H6 LESS L3832  247.946 211.237 8.632 1.00 0.00  L  
LINE 627 H7 LESS L3832  249.579 209.288 8.508 1.00 0.00  L  
LINE 628 H8 LESS L3832  249.099 207.700 9.100 1.00 0.00  L  
LINE 629 H9 LESS L3832  249.027 208.740 11.477 1.00 0.00  L  
LINE 630 H10 LESS L3832  250.225 210.662 11.662 1.00 0.00  L  
LINE 631 H11 LESS L3832  251.496 208.674 11.726 1.00 0.00  L  
LINE 632 H12 LESS L3832  250.955 207.340 10.685 1.00 0.00  L  
LINE 633 H13 LESS L3832  251.705 208.795 9.985 1.00 0.00  L  
LINE 638 C LESS L3868  134.617 216.380 10.086 1.00 0.00  L  
LINE 639 C1 LESS L3868  135.525 217.163 9.136 1.00 0.00  L  
LINE 640 C2 LESS L3868  135.265 218.663 9.269 1.00 0.00  L  
LINE 641 C3 LESS L3868  137.012 216.776 9.298 1.00 0.00  L  
LINE 642 C4 LESS L3868  137.659 217.089 10.654 1.00 0.00  L  
LINE 643 C5 LESS L3868  139.029 216.429 10.766 1.00 0.00  L  
LINE 644 O LESS L3868  137.832 218.495 10.789 1.00 0.00  L  
LINE 645 H LESS L3868  134.797 215.303 9.997 1.00 0.00  L  
LINE 646 H1 LESS L3868  134.772 216.668 11.130 1.00 0.00  L  
LINE 647 H2 LESS L3868  133.564 216.562 9.845 1.00 0.00  L  
LINE 648 H3 LESS L3868  135.242 216.879 8.114 1.00 0.00  L  
LINE 649 H4 LESS L3868  135.381 219.008 10.301 1.00 0.00  L  
LINE 650 H5 LESS L3868  134.241 218.901 8.961 1.00 0.00  L  
LINE 651 H6 LESS L3868  135.946 219.237 8.632 1.00 0.00  L  
LINE 652 H7 LESS L3868  137.579 217.288 8.508 1.00 0.00  L  
LINE 653 H8 LESS L3868  137.099 215.700 9.100 1.00 0.00  L  
LINE 654 H9 LESS L3868  137.027 216.740 11.477 1.00 0.00  L  
LINE 655 H10 LESS L3868  138.225 218.662 11.662 1.00 0.00  L  
LINE 656 H11 LESS L3868  139.496 216.674 11.726 1.00 0.00  L  
LINE 657 H12 LESS L3868  138.955 215.340 10.685 1.00 0.00  L  
LINE 658 H13 LESS L3868  139.705 216.795 9.985 1.00 0.00  L  
LINE 677 O LESS L3882  249.832 218.495 10.789 1.00 0.00  L  
LINE 678 H LESS L3882  246.797 215.303 9.997 1.00 0.00  L  
LINE 679 H1 LESS L3882  246.772 216.668 11.130 1.00 0.00  L  
LINE 680 H2 LESS L3882  245.564 216.562 9.845 1.00 0.00  L  
......................................................................... 

導致輸出如下:

LINE 592 C LESS L3818  134.617 208.380 10.086 1.00 0.00  L  
LINE 593 C1 LESS L3818  135.525 209.163 9.136 1.00 0.00  L  
LINE 594 C2 LESS L3818  135.265 210.663 9.269 1.00 0.00  L  
LINE 595 C3 LESS L3818  137.012 208.776 9.298 1.00 0.00  L  
LINE 596 C4 LESS L3818  137.659 209.089 10.654 1.00 0.00  L  
LINE 597 C5 LESS L3818  139.029 208.429 10.766 1.00 0.00  L  
LINE 598 O LESS L3818  137.832 210.495 10.789 1.00 0.00  L  
LINE 599 H LESS L3818  134.797 207.303 9.997 1.00 0.00  L  
LINE 600 H1 LESS L3818  134.772 208.668 11.130 1.00 0.00  L  
LINE 601 H2 LESS L3818  133.564 208.562 9.845 1.00 0.00  L  
LINE 602 H3 LESS L3818  135.242 208.879 8.114 1.00 0.00  L  
LINE 603 H4 LESS L3818  135.381 211.008 10.301 1.00 0.00  L  
LINE 604 H5 LESS L3818  134.241 210.901 8.961 1.00 0.00  L  
LINE 605 H6 LESS L3818  135.946 211.237 8.632 1.00 0.00  L  
LINE 606 H7 LESS L3818  137.579 209.288 8.508 1.00 0.00  L  
LINE 607 H8 LESS L3818  137.099 207.700 9.100 1.00 0.00  L  
LINE 608 H9 LESS L3818  137.027 208.740 11.477 1.00 0.00  L  
LINE 609 H10 LESS L3818  138.225 210.662 11.662 1.00 0.00  L  
LINE 610 H11 LESS L3818  139.496 208.674 11.726 1.00 0.00  L  
LINE 611 H12 LESS L3818  138.955 207.340 10.685 1.00 0.00  L  
LINE 612 H13 LESS L3818  139.705 208.795 9.985 1.00 0.00  L   
LINE 638 C LESS L3868  134.617 216.380 10.086 1.00 0.00  L  
LINE 639 C1 LESS L3868  135.525 217.163 9.136 1.00 0.00  L  
LINE 640 C2 LESS L3868  135.265 218.663 9.269 1.00 0.00  L  
LINE 641 C3 LESS L3868  137.012 216.776 9.298 1.00 0.00  L  
LINE 642 C4 LESS L3868  137.659 217.089 10.654 1.00 0.00  L  
LINE 643 C5 LESS L3868  139.029 216.429 10.766 1.00 0.00  L  
LINE 644 O LESS L3868  137.832 218.495 10.789 1.00 0.00  L  
LINE 645 H LESS L3868  134.797 215.303 9.997 1.00 0.00  L  
LINE 646 H1 LESS L3868  134.772 216.668 11.130 1.00 0.00  L  
LINE 647 H2 LESS L3868  133.564 216.562 9.845 1.00 0.00  L  
LINE 648 H3 LESS L3868  135.242 216.879 8.114 1.00 0.00  L  
LINE 649 H4 LESS L3868  135.381 219.008 10.301 1.00 0.00  L  
LINE 650 H5 LESS L3868  134.241 218.901 8.961 1.00 0.00  L  
LINE 651 H6 LESS L3868  135.946 219.237 8.632 1.00 0.00  L  
LINE 652 H7 LESS L3868  137.579 217.288 8.508 1.00 0.00  L  
LINE 653 H8 LESS L3868  137.099 215.700 9.100 1.00 0.00  L  
LINE 654 H9 LESS L3868  137.027 216.740 11.477 1.00 0.00  L  
LINE 655 H10 LESS L3868  138.225 218.662 11.662 1.00 0.00  L  
LINE 656 H11 LESS L3868  139.496 216.674 11.726 1.00 0.00  L  
LINE 657 H12 LESS L3868  138.955 215.340 10.685 1.00 0.00  L  
LINE 658 H13 LESS L3868  139.705 216.795 9.985 1.00 0.00  L 

謝謝 阿林

+0

你真的覺得你需要發佈100行(或不管是什麼)的輸入來展示你的問題?如果您發佈了10條或更少的輸入行,則會有更多的人有興趣幫助您。 –

回答

-1

是。像Perl一樣,AWK是一個數據提取和報告工具。您可以使用數組來檢查第三列中的字符集是否唯一。您也可以使用變量來存儲和檢查第五列的標識。

awk -v n=20 '{ r = (r ? r RS : "") $0; c++ } $3 in a || s != $5 { r=$0; c=""; delete a } c == n { print r; r=c=""; delete a } { a[$3]; s = $5 }' file 
0

目前公認answer通過Steve是寫作的一個非常囉嗦的方式:

awk '{if (a[$3,$5]++ == 0) print}' 

嚴格來說,這並不擔心連續性;如果L3818的一些新條目出現在文件的更遠處,它會記住頂部附近的條目。如果這是一個問題,你可以使用:

awk '{if ($5 != old_5) {delete a; old_5 = $5} if (a[$3,$5] == 0) print}' 
相關問題