2014-09-03 90 views
-3

從下面的表格中,我想總結第1列中的miRNA具有正值和負值(第3列)的次數,並將其作爲條形圖。查找矩陣中字符串的出現次數

我已經作出這個命令,但隨後對求和值,而不是計數OCCURENCES的:

awk '{x[$1 " " $2]+=$3} END{for (r in x)print r,x[r]}' 

例如:

miRNA   target   value 

mmu-miR-423-3p NM_198167  0.7999 
mmu-miR-744-5p NM_001166476 0.79927 
mmu-miR-423-5p NM_146188  -0.79503 
mmu-miR-423-3p NM_172262  -0.79463 
mmu-miR-3968 NM_001185020 0.79367 
mmu-miR-298-5p NM_175127  0.79357 
mmu-miR-423-5p NM_009320  -0.7934 
mmu-miR-423-5p NM_015732  0.7928 
.... 

output: 

miRNA   positive   negative 
mmu-miR-423-3p 1     1 
mmu-miR-423-5p 1     2 
+0

使用散列是解決這個問題的慣用方式(在Perl中)。 – TLP 2014-09-03 16:17:56

+0

編輯了一個awk我試圖用 – user3741035 2014-09-03 16:44:08

回答

2
$ awk ' 
{ $3<0 ? neg[$1]++ : pos[$1]++ } 
END { 
    fmt = "%-16s%-10s%s\n" 
    printf fmt, "miRNA", "positive", "negative" 
    for (rna in pos) 
     if (rna in neg) 
      printf fmt, rna, pos[rna], neg[rna] 
} 
' file 
miRNA   positive negative 
mmu-miR-423-3p 1   1 
mmu-miR-423-5p 1   2 
+1

+1來獲得一個簡潔的解決方案。我會減少它到一個相當三元運算符'$ 3 <0? neg [$ 1] ++:pos [$ 1] ++'。只是爲了記錄,'perl'解決方案在這裏並不公平。 ';)' – 2014-09-03 21:52:27

+1

我不認爲我曾經使用除了在作業或打印的右側以外的三元表達式,否則我從來沒有真正想過使用它。我想我喜歡它,只需要說服自己這不是混淆代碼。我只是決定我喜歡它,結果代碼清晰,因此我更新了答案。謝謝! – 2014-09-03 22:31:45

2

嘗試在R:

ddf$sign = ifelse(ddf$value<0,"neg","pos") 
with(ddf, table(miRNA, sign)) 
       sign 
miRNA   neg pos 
    mmu-miR-298-5p 0 1 
    mmu-miR-3968  0 1 
    mmu-miR-423-3p 1 1 
    mmu-miR-423-5p 2 1 
    mmu-miR-744-5p 0 1 
1

Perl解決方案:

use strict; 
use warnings; 

my %dataCoutner; 
foreach my $line (<DATA>) { 
    chomp($line); 
    next if($line =~ /^miRNA/); 
    my @data = split /\s+/,$line; 
    if($data[2] < 0) { 
     $dataCoutner{$data[0]}->{'neg'}++; 
    } 
    else { 
     $dataCoutner{$data[0]}->{'pos'}++; 
    } 
} 
print "miRNA\tpositive\tnegative\n"; 
foreach my $key (sort keys %dataCoutner) { 
    print "$key\t" . ($dataCoutner{$key}->{'pos'} // 0) . "\t" . ($dataCoutner{$key}->{'neg'} // 0) . "\n"; 
} 

__DATA__ 
miRNA   target   value 
mmu-miR-423-3p NM_198167  0.7999 
mmu-miR-744-5p NM_001166476 0.79927 
mmu-miR-423-5p NM_146188  -0.79503 
mmu-miR-423-3p NM_172262  -0.79463 
mmu-miR-3968 NM_001185020 0.79367 
mmu-miR-298-5p NM_175127  0.79357 
mmu-miR-423-5p NM_009320  -0.7934 
mmu-miR-423-5p NM_015732  0.7928 
相關問題