2013-03-20 28 views
1

對於遺傳分析,我試圖將2概率文件(10GB)轉換爲3概率文件。基本上我必須在每兩個其他實例之後插入第三列,這第三列可以計算爲1-(第一實例+第二實例)。你會如何做到這一點?對每對字段執行計算

來源:

0.800 0.200 0.000 0.200 0.800 0.200 
0.000 0.900 0.000 0.900 0.000 0.900 
0.900 0.010 0.900 0.010 0.770 0.010 

(該文件包含許多列和行)

0.800 0.200 0.000 0.000 0.200 0.800 0.800 0.200 0.000 
0.000 0.900 0.100 0.000 0.900 0.100 0.000 0.900 0.100 
0.900 0.010 0.090 0.900 0.010 0.090 0.770 0.010 0.220 

回答

2

awk

awk '{for(i=1;i<=NF;i+=2)$(i+1)=$(i+1)OFS sprintf("%.3f",1-$(i+1)-$i)}1' OFS='\t' file 
0.800 0.200 0.000 0.000 0.200 0.800 0.800 0.200 0.000 
0.000 0.900 0.100 0.000 0.900 0.100 0.000 0.900 0.100 
0.900 0.010 0.090 0.900 0.010 0.090 0.770 0.010 0.220 
1
#! /usr/bin/env perl 

use strict; 
use warnings; 

*ARGV = *DATA; # for demo only 

while (<>) { 
    chomp; 

    my @fields = split; 
    my @output; 
    while (@fields >= 2) { 
    my($x,$y) = splice @fields, 0, 2; 

    push @output, $x, $y, sprintf "%.3f", 1.0 - ($x + $y); 
    } 

    print join(" " x 3, @output, @fields), "\n"; 
} 

__DATA__ 
0.800 0.200 0.000 0.200 0.800 0.200 
0.000 0.900 0.000 0.900 0.000 0.900 
0.900 0.010 0.900 0.010 0.770 0.010 

輸出:

0.800 0.200 0.000 0.000 0.200 0.800 0.800 0.200 0.000 
0.000 0.900 0.100 0.000 0.900 0.100 0.000 0.900 0.100 
0.900 0.010 0.090 0.900 0.010 0.090 0.770 0.010 0.220
1
#!/usr/bin/perl 
use strict; use warnings; 

my $template = join "\t", ("%.3f")x3; 

while (<>) { 
    my @fields = split; 
    @fields % 2 == 0 or die "Uneven number of fields"; 
    while (my ($x, $y) = splice @fields, 0, 2) { 
    printf $template, $x, $y, 1 - ($x + $y); 
    print @fields ? "\t" : "\n"; 
    } 
} 

用法:perl script.pl <input >output-file