如果我在下面有一個輸入文件，Linux中是否有任何命令/方式將它轉換爲我所需的文件，如下所示？在linux中合並行

輸入文件：

Column_1  Column_2 
scaffold_A SNP_marker1 
scaffold_A SNP_marker2 
scaffold_A SNP_marker3 
scaffold_A SNP_marker4 
scaffold_B SNP_marker5 
scaffold_B SNP_marker6 
scaffold_B SNP_marker7 
scaffold_C SNP_marker8 
scaffold_A SNP_marker9 
scaffold_A SNP_marker10

所需的輸出文件：

Column_1  Column_2 
scaffold_A SNP_marker1;SNP_marker2;SNP_marker3;SNP_marker4 
scaffold_B SNP_marker5;SNP_marker6;SNP_marker7 
scaffold_C SNP_marker8 
scaffold_A SNP_marker9;SNP_marker10

我想用grep，uniq的等，但還是沒能弄清楚如何得到這個工作。

來源

2013-07-24 amine

perl是一個選項嗎？ – urzeit

等待，在您的輸出scaffold_A出現兩次。什麼決定是否給予標記應該去第一個或第二個入口？ –

@SF。看來OP希望按Column_1分組輸出 - 但僅限於現有組。 –

Perl的解決方案：一個bash腳本中

perl -lane 'sub output { 
       print "$last\t", join ";", @buff; 
      } 
      $last //= $F[0]; 
      if ($F[0] ne $last) { 
       output(); 
       undef @buff; 
       $last = $F[0]; 
      } 
      push @buff, $F[1]; 
      }{ output();'

來源

2013-07-24 11:37:42 choroba

awk的解決方案

#!/bin/bash 

awk ' 
BEGIN{ 
    str = "" 
} 
{ 
    if (str != $1) { 
     if (NR != 1){ 
      printf("\n") 
     } 
     str = $1 
     printf("%s\t%s",$1,$2) 
    } else if (str == $1) { 
     printf(";%s",$2) 
    } 
} 
END{ 
     printf("\n") 
}' your_file.txt

來源

2013-07-24 13:19:28

蟒蛇解決方案（在命令行傳遞假設文件名）

from __future__ import print_function #not needed with Python3 
with open('infile') as infile, open('outfile', 'w') as outfile: 
    outfile.write(infile.readline()) # transfer the header 
    col_one, col_two = infile.readline().split() 
    col_two = [col_two] # make it a list 
    for line in infile: 
     data = line.split() 
     if col_one != data[0]: 
      print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile) 
      col_one = data[0] 
      col_two = [data[1]] 
     else: 
      col_two.append(data[1]) 
    print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)

來源

2013-07-24 13:50:59

工作很酷!!!!! 但那裏有一個小小的錯誤。從腳本生成的輸出稍有不同： Column_1 Column_2 scaffold_A SNP_marker1; Scaffold_A SNP_marker2; SNP_marker3; SNP_marker4 scaffold_B SNP_marker5; SNP_marker6; SNP_marker7 scaffold_C SNP_marker8 scaffold_A SNP_marker9; SNP_marker10 – amine

你也可以嘗試以下解決方案在bash中：

cat input.txt | while read L; do y=`echo $L | cut -f1 -d' '`; { test "$x" = "$y" && echo -n ";`echo $L | cut -f2 -d' '`"; } || { x="$y";echo -en "\n$L"; }; done

或在人更可讀的形式審查：

cat input.txt | while read L; 
do 
    y=`echo $L | cut -f1 -d' '`; 
    { 
    test "$x" = "$y" && echo -n ";`echo $L | cut -f2 -d' '`"; 
    } || 
    { 
    x="$y";echo -en "\n$L"; 
    }; 
done

注意，在腳本的結果漂亮格式化輸出執行是基於所述bash echo命令。

來源

2013-07-31 11:31:36 rook

有[類似的問題類似的解決方案]（http://stackoverflow.com/questions/17897255/how-to-merge-類似的線在Linux/18018828＃18018828）只是爲了保持附近的類似的東西 – rook

如果你不介意使用Python，它有itertools.groupby，供應這個目的：

# file: comebine.py 
import itertools 

with open('data.txt') as f: 
    data = [row.split() for row in f] 

for column1, rows_group in itertools.groupby(data, key=lambda row: row[0]): 
    print column1, ';'.join(column2 for column1, column2 in rows_group)

保存此腳本combine.py。假設你的輸入文件是data.txt中，運行它以獲得您想要的輸出：

python combine.py

討論

的with open(...)塊的結果是data，行的列表，每個行本身是列的列表。
itertools.groupby函數需要一個迭代，在這種情況下，一個列表。你告訴它如何使用一個鍵，這是column1將線條組合在一起。
rows_group是共享同一列的行的列表1

來源

2013-08-02 16:03:52

在linux中合​​並行

回答

討論

相關問題

在linux中合並行