Perl，基於secod行中的值更改第一行中的值，

因此，我有超過480000行和1380列的此文件。如果第二行的值是Sex：Female或Sex：Male，那麼我需要一個將F_或M_添加到第一行值的流水線。Perl，基於secod行中的值更改第一行中的值，

我文件中的第一行基本上是個別的ids，後面跟着單元格類型-N或-G。第二行指出該個體是女性還是男性，其餘的行是第一列中的probe_Ids，其他列是他們對應每個人的beta_value。如果這會更有意義，我會添加以下幾行。

我的輸入文件是這樣的（製表符分隔）沒有第一列。

1740-N 1546-N 1546-G 1740-G 1228-G 5121-N 5121-G 
Sex: Female Sex: Female Sex: Female Sex: Female Sex: Male Sex: Female Sex: Female

我的輸出應該是這樣的（製表符分隔），而第一列

F_1740-N F_1546-N F_1546-G F_1740-G M_1228-G F_5121-N F_5121-G

注意性別線不被輸出。

任何人都可以幫忙嗎？如果我的列數很少，我會手動執行。

這可以在任何程序中完成;我不堅持perl

來源

2013-11-15 user2997397

這是整個文件中每個文件或配對行的第一行和第二行嗎？ –

它們都在一個文件中。第二行是第一行的決定因素，但我不需要兩者都只需要一個作爲標題。這爲我節省了太多時間。 – user2997397

我的意思是這只是文件中的第一行和第二行，而不是文件中的多個位置對，但聽起來像是這樣。 –

保持一行的緩衝區。

my $last_line = <>; 
if ($last_line) { 
    while (my $this_line = <>) { 
     if ($this_line =~ /^Sex:/) { 
     adjust_for_sex($last_line, $this_line); 
     next; # Don't display the Sex row. 
     } 

     print($last_line); 
     $last_line = $this_line; 
    } 

    print($last_line); 
}

這是不實際的更改代碼：

sub adjust_for_sex { 
    my ($last_line, $this_line) = @_; 

    chomp($last_line); 
    my @last_fields = split /\t/, $last_line; 

    chomp($this_line); 
    my @this_fields = split /\t/, $this_line; 

    for my $i (0..$#last_fields) { 
     my ($sex) = $this_fields[$i] =~ /^Sex: (.)/ 
     or die; 

     $last_fields[$i] = $sex . "_" . $last_fields[$i]; 
    } 

    # Changes the first argument in the caller. 
    $_[0] = join("\t", @last_fields) . "\n"; 
}

來源

2013-11-15 19:18:22 ikegami

修正了一些問題。 – ikegami

它並沒有改變名稱實際上，或者我沒有正確地做它 – user2997397

卡在一個無限循環，因爲我用'重做'在哪裏我應該使用'下一個'。固定。無論如何，基於新的信息 - 只有一個性行，而且它是文件的第二行 - 這個解決方案是過分的。 – ikegami

像這樣的東西應該在AWK工作。儘管存儲第一行的所有數據，但仍需要一些內存。

BEGIN {FS="\t"} 

NR == 1 { 
    for (i = 1; i <= NF; i++) { 
     f[i]=$i 
    } 
    next 
} 

NR == 2 { 
    for (i = 1; i <= NF; i++) { 
     $i=gensub(/Sex: ([FM]).*/, "\\1", "g", $i) 
     $i=$i"_"f[i] 
    } 
    print 
    next 
} 

{print}

如果對匹配整個文件中像下面這種模式重複可能做的工作線：

BEGIN {FS="\t"} 

line && /^Sex:/{ 
    split(line, f) 
    line="" 

    for (i = 1; i <= NF; i++) { 
     $i=substr($i, 0, 6) 
     gsub(/^Sex: /, "", $i) 
     printf "%s ", $i"_"f[i] 
    } 
    print "" 
    next 
} 

line {print line} 

{line=$0}

來源

2013-11-15 19:27:51

我有一個印象，那就是文件後面會有id和sex lines。是這種情況，@ user2997397？ – ikegami

我不知道。我試過問，但沒有得到一個好的答案。如果文件中沒有其他行，那麼只需刪除NR模式（也是默認打印）。如果有的話，下面的perl回答更有必要。 –

可能值得一提的是'awk' **，因爲這似乎是使用中的語言。 –

這是寫假設輸入文件已經重複對線要分析起來。在解析前兩行後，可以很容易地將其修改爲停止，但我仍然保持原樣，即使在他/她澄清之後它不回答操作的問題。也許這對其他人有用。

#!perl 

use strict; 
use warnings; 

open(IN, "in.txt") or die $!; 
open(OUT, ">out.txt") or die $!; 
my $secondLine ; 
while(<IN>) { 
    my $firstLine = $_; 
    chomp $firstLine; 
    $secondLine = <IN> || ""; 
    chomp $secondLine; 
    # Break out if there are no more lines with data (actually, this just detects 1-2 blank lines in a row, not necessarily at the end of the file yet) 
    if ((! $firstLine) && (! $secondLine)) { last } 
    my @firstLine = split(/\s+/, $firstLine); 
    my @secondLine = split(/\s*Sex:\s*/, $secondLine); 
    # The first element in @secondLine will always be the "null" before the first "Sex: ". 
    # Throw it away. 
    shift @secondLine; 
    if (scalar(@firstLine) != scalar(@secondLine)) { die "Uneven # of fields in these 2 lines:\n$firstLine\n$secondLine\n" } 

    # OK, output time. 
    for (my $i=0; $i<scalar(@firstLine); $i++) { 
    print OUT substr($secondLine[$i], 0, 1) . "_$firstLine[$i] "; 
    } 
    print OUT "\n"; 
} 
close(IN); 
close(OUT); 

if (! $secondLine) { 
    warn "The file does not appear to have an even number of lines.\n"; 
}

來源

2013-11-15 19:35:55 jimtut

我得到的印象是文件不僅僅是id和性行。是這種情況，@ user2997397？（如果只有OP跟進請求顯示更多的行！） – ikegami

@ikegami我更新了我的帖子 – user2997397

@ikegami我真的不知道你是什麼意思，一個身份證和性行！這是一行ID和一行性別。從3到479973的行是實際的數據 – user2997397

如何：

#!/usr/bin/perl 


while(<>) { 
    chop; 
    @N=split; 
    $_=<>; 
    chop; 
    s/\s*Sex:\s*//g;s/emale/ /g;s/ale/ /g; 
    @S=split; 
    foreach $k (0..$#N) { 
    $i=$N[$k]; $g=$S[$k]; 
    print "$g" . '_' . "$i " ; 
    } 
    print "\n"; 
}

來源

2013-11-15 19:39:14 user2997631

我得到的印象是文件不僅僅是id和性行。是這種情況，@ user2997397？（如果只有OP跟進請求顯示更多的行！） – ikegami

不能工作，請你向我解釋什麼是@N，@S，$ k，$ N，$ i和$ g，我得到了關於這些符號的錯誤 – user2997397

這可能會爲你工作（GNU SED）：

sed -ri '1{N;:a;s/(\b[0-9]{4}-[GN].*\n)\s*Sex:\s*(.)\S+/\2_\1/;ta;s/\n//}' file

這結合了線1和2，然後做一個替代循環，直到沒有進一步的列可以匹配。

來源

2013-11-15 21:30:48 potong

沒有與我一起工作，不斷得到錯誤，如（sed：非法選項 - r） – user2997397

$ awk -F'\t' ' 
NR%2 { split($0,a); next } 
{ 
    for (i=1;i<=NF;i++) 
     printf "%s%s_%s", (i==1?"":FS), ($i~/Female/?"F":"M"), a[i] 
    print "" 
} 
' file 
F_1740-N  F_1546-N  F_1546-G  F_1740-G  M_1228-G  F_5121-N F_5121-G

來源

2013-11-16 15:38:49

我試過你的代碼，它只生成一個空文件。 – user2997397

然後，您不正確地複製/粘貼腳本，或給它一個空的或損壞的輸入文件，否則這是不可能的。 –

Perl，基於secod行中的值更改第一行中的值，

回答

相關問題