使用awk來存儲數值多個文件

讀我使用的cygwin上的Windows 7。我有一個包含所有文本文件的目錄，我想循環它，併爲每個文件（1,2）（2,2）和（3,2）保存前三行第二列的數據。使用awk來存儲數值多個文件

因此，代碼會像

x1[0]=awk 'FNR == 1{print $2}'$file1 

    x1[1]=awk 'FNR == 2{print $2}'$file1 

    x1[2]=awk 'FNR == 3{print $2}'$file1

然後我想通過$x1 100加1使用鴻溝從其他文件訪問數據，並存儲在數組中。這就是：

let x1[0]=$x1[0]/100 + 1 let x1[1]=$(x1[1]/100)+1 let x1[2]=$(x1[2]/100)+1 read1=$(awk 'FNR == '$x1[0]' {print $1}' $file2) read2=$(awk 'FNR == '$x1[1]' {print $1}' $file2) read3=$(awk 'FNR == '$x1[2]' {print $1}' $file2)

做同樣的事情其他文件，除了我們不需要$x1這一點。

read4=$(awk 'FNR == 1{print $3,$4,$5,$6}' $file3)

最後，只輸出這些值，即read1-4

文件需要這樣做在一個循環中的所有文件的文件夾中，不太清楚如何去that.The棘手的部分是，$file3文件名取決於$file1文件名，

所以如果$file1 = abc123def.fna.map.txt

$file3將abc123def.fna

$file2在其中進行了硬編碼並且在所有迭代中保持不變。

file1是一個.txt文件，它的一部分看起來像：

99 58900 16 59000 14 73000

file2包含600線串。

'Actinobacillus_pleuropneumoniae_L20' 'Actinobacillus_pleuropneumoniae_serovar_3_JL03' 'Actinobacillus_succinogenes_130Z'

「文件3」是FASTA文件和前兩行看起來像這樣

>gi|94986445|ref|NC_008011.1| Lawsonia intracellularis PHE/MN1-00, complete genome ATGAAGATCTTTTTATAGAGATAGTAATAAAAAAATGTCAGATAGATATACATTATAGTATAGTAGAGAA

輸出可以只寫全部4讀取到一個隨機文件或者如果可能的話可以比較READ1， read2，read3，如果它匹配read4，即主名稱應匹配。在我的例子中：

沒有read1-3匹配Lawsonia intracellularis這是read4的一部分。所以它只能打印成功或故障到文件。

樣本輸出

Actinobacillus_pleuropneumoniae_L20 Actinobacillus_pleuropneumoniae_serovar_3_JL03 Actinobacillus_succinogenes_130Z Lawsonia intracellularis Failture

對不起，我錯了約6讀取，只需要4實際。再次感謝您的幫助。

來源

2012-03-16 dawnoflife

在這一行你缺少的字符'它應該是X1 [0] ='的awk「FNR == 1 {打印$ 2}」 $ file1'更好您發佈的完整代碼，這樣我可以發表評論它 – Raghuram 2012-03-16 03:40:19

更重要的是，我們展示了從3檔3線，$ 1，正確的值，那麼預期輸出。除非你隱藏了很多其他的東西，否則幾乎可以確定這可以在1 awk程序中完成。祝你好運。 – shellter 2012-03-16 03:48:49

AND; - ）...你實際上是你的第一個3行代碼'X1 [0] = AWK ...'得到可用的數據？你稍後使用cmd替換，你不希望'x1 [0] = $（awk ...）'爲那些第一行嗎？並且let x1 [0]行與以下兩行不同。祝你好運。 – shellter 2012-03-16 03:51:47

這個問題可以通過TXR解決：http://www.nongnu.org/txr

好吧，我有這些示例文件（不是你的投入，不幸）：

$ ls -l 
total 16 
-rwxr-xr-x 1 kaz kaz 1537 2012-03-18 20:07 bac.txr   # the program 
-rw-r--r-- 1 kaz kaz 153 2012-03-18 19:16 foo.fna   # file3: genome info 
-rw-r--r-- 1 kaz kaz 24 2012-03-18 19:51 foo.fna.map.txt # file1 
-rw-r--r-- 1 kaz kaz 160 2012-03-18 19:56 index.txt   # file2: names of bacteria 

$ cat index.txt 
'Actinobacillus_pleuropneumoniae_L20' 
'Actinobacillus_pleuropneumoniae_serovar_3_JL03' 
'Lawsonia_intracellularis_PHE/MN1-00' 
'Actinobacillus_succinogenes_130Z' 

$ cat foo.fna.map.txt # note leading spaces: typo or real? 
13 000 
19 100 
7 200 

$ cat foo.fna 
gi|94986445|ref|NC_008011.1| Lawsonia intracellularis PHE/MN1-00, complete genome 
ATGAAGATCTTTTTATAGAGATAGTAATAAAAAAATGTCAGATAGATATACATTATAGTATAGTAGAGAA

正如你所看到的，我做的數據，所以有將成爲Lawsonia的一場比賽。

運行：

$ ./bac.txr foo.fna.map.txt 
Lawsonia intracellularis PHE/MN1-00 ATGAAGATCTTTTTATAGAGATAGTAATAAAAAAATGTCAGATAGATATACATTATAGTATAGTAGAGAA

代碼如下。這只是一個原型;顯然它必須使用真實數據進行開發和測試。我已經做了一些猜測，比如索引中的Lawsonia條目與附加的代碼一樣。

#!/usr/local/bin/txr -f 
@;;; collect the contents of the index fileo 
@;;; into the list called index. 
@;;; single quotes around lines are removed 
@(block) 
@ (next "index.txt") 
@ (collect) 
'@index' 
@ (end) 
@(end) 
@;;; filter underscores to spaces in the index 
@(set index @(mapcar (op regsub #/_/ " ") index)) 
@;;; process files on the command line 
@(next :args) 
@(collect) 
@;;; each command line argument has to match two patterns 
@;;; @file1 takes the whole thing 
@;;; @file3 matches the part before .map.txt 
@ (all) 
@file1 
@ (and) 
@file3.map.txt 
@ (end) 
@;;; go into file 1 and collect second column material 
@;;; over three lines into lineno list. 
@ (next file1) 
@ (collect :times 3) 
@junk @lineno 
@ (end) 
@;;; filter lineno list through a function which 
@;;; converts to integer, divides by 100 and adds 1. 
@ (set lineno @(mapcar (op + 1 (trunc (int-str @1) 100)) 
         lineno)) 
@;;; map the three line numbers to names through the 
@;;; index, and bind these three names to variables 
@ (bind (name1 name2 name3) @(mapcar index lineno)) 
@;;; now go into file 3, and extract the name of the 
@;;; bacterium there, and the genome from the 2nd line 
@ (next file3) 
@a|@b|@c|@d| @name, complete genome 
@genome 
@;;; if the name matches one of the three names 
@;;; then output the name and genome, otherwise 
@;;; output failed 
@ (cases) 
@ (bind name (name1 name2 name3)) 
@ (output) 
@name @genome 
@ (end) 
@ (or) 
@ (output) 
failed 
@ (end) 
@ (end) 
@(end)

來源

2012-03-19 03:31:59 Kaz

使用awk來存儲數值多個文件

回答

相關問題