將3列的文件轉換爲矩陣

我有一個文件，信息分成3列。第一列表示將填充矩陣的第一行的類別，第二列表示將在矩陣的第一列中的類別。第三行代表將填充矩陣大部分的值。原始文件的第1列和第2列可以顛倒過來，這並沒有什麼不同。將3列的文件轉換爲矩陣

的文件看起來像這樣

Category1 type1 + 
Category1 type2 - 
Category1 type3 + 
Category2 type1 + 
Category2 type2 + 
Category2 type3 + 
Category3 type1 + 
Category3 type2 - 
Category3 type3 -

我希望把它變成看起來像這樣

Category1 Category2 Category3 
type1 + + + 
type2 - + - 
type3 + + -

我想awk將可能做到這一點，我只是不矩陣知道如何讓awk做到這一點

來源

2017-05-15 Jacob

關於輸入數據：列是否由製表符分隔或用空格分隔？它應該如何與輸出有關？ – Scheff

@Scheff一切都是製表符分隔 – Jacob

啊哈。我會很快發送一個解決方案。（它目前用於輸入分隔的空間和用於輸出分隔的選項卡。） – Scheff

awk來救援！

awk 'BEGIN {FS=OFS="\t"} 
      {col[$1]; row[$2]; val[$2,$1]=$3} 
    END {for(c in col) printf "%s", OFS c; print ""; 
      for(r in row) 
       {printf "%s", r; 
       for(c in col) printf "%s", OFS val[r,c] 
       print ""}}' file 

     Category1  Category2  Category3 
type1 +  +  + 
type2 -  +  - 
type3 +  +  -

來源

2017-05-15 16:54:44 karakfa

我認爲這是在將它放入腳本而不是「單線程」的時候。 – 123

我並不是不同意...... – karakfa

這是一個基於GNU awk的解決方案。我強調這一點，因爲多維數組（這是爲了方便的解決方案而獲得的）是GNU awk特有的特性。

我的腳本table2matrix.awk：

# collect values 
{ 
    # category=$1 ; type=$2 ; value=$3 
    if (!($1 in categories)) { categories[$1] } 
    types[$2][$1] = $3 
} 
# output of values 
END { 
    # print col. header 
    for (category in categories) { printf("\t%s", category); } 
    print "" 
    # print rows 
    for (type in types) { 
    printf("%s", type); 
    for (category in categories) { 
     printf("\t%s", types[type][category]); 
    } 
    print "" 
    } 
}

樣品會話：

$ cat >table.txt <<EOF 
> Category1 type1 + 
> Category1 type2 - 
> Category1 type3 + 
> Category2 type1 + 
> Category2 type2 + 
> Category2 type3 + 
> Category3 type1 + 
> Category3 type2 - 
> Category3 type3 - 
> EOF 

$ awk -f table2matrix.awk table.txt 
     Category1  Category2  Category3 
type1 +  +  + 
type2 -  +  - 
type3 +  +  - 

$ cat table.txt | sed $'s/ /\t/g' >table-tabs.txt 

$ awk -f table2matrix.awk table-tabs.txt 
     Category1  Category2  Category3 
type1 +  +  + 
type2 -  +  - 
type3 +  +  - 

$ cat >table-sorted.txt <<EOF 
> Category1 type1 + 
> Category1 type3 + 
> Category2 type1 + 
> Category2 type2 + 
> Category2 type3 + 
> Category3 type1 + 
> Category1 type2 - 
> Category3 type2 - 
> Category3 type3 - 
> EOF 

$ awk -f table2matrix.awk table-sorted.txt 
     Category1  Category2  Category3 
type1 +  +  + 
type2 -  +  - 
type3 +  +  - 

$ tac table.txt >table-reverse.txt 

$ awk -f table2matrix.awk table-reverse.txt 
     Category1  Category2  Category3 
type1 +  +  + 
type2 -  +  - 
type3 +  +  - 

$ grep '+' table.txt >table-incompl.txt 

$ awk -f table2matrix.awk table-incompl.txt 
     Category1  Category2  Category3 
type1 +  +  + 
type2   + 
type3 +  + 

$

table.txt是分開的（從Web瀏覽器複製/粘貼）的空間，table-tabs.txt是table.txt與製表符代替空格序列。

從腳本（但不是來自Web瀏覽器中的代碼示例）中可以看出，輸出是製表符分隔的。

在測試了原始示例輸入的一些變體之後，我修復了我的awk腳本。它變得更短一點，更類似於karafka的其他解決方案...

來源

2017-05-15 16:57:38 Scheff

將3列的文件轉換爲矩陣

回答

相關問題