2012-08-22 46 views
2

我有一個文件,我需要根據另一個文件中給出的字符範圍來提取段。我想用awk命令來做。從awk文件中提取段

文件一個看起來像這樣(一行):

AATTGTGAAGGTAGATGGCTCGCTCCGCGGCGGGGCGCGCGCGCGCGCGCGGGCTCGCTATATAGAGATATATGCGCGCGGCGCGCGGCGCGCGCGGCGCGCGCGTATATATATAGGCGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCCCCCCCCCCC 

第二個文件將類似於如下:

5 10 
13 20 
22 24 

和輸出將是:

GTGAAG 
AGATGGCT 
GCT 

回答

3

這一行代碼將解決您的問題:

awk 'BEGIN{getline sequence < "first_file"} {print substr(sequence, $1, $2 - $1 + 1) }' second_file

說明:這個腳本使用getline函數命名first_file(它調節到實際文件名)文件中讀取字符串sequence。然後對於第二個文件的每一行(包含處理範圍),它使用substr函數提取必要的子字符串。 substr接受三個參數:字符串(sequence),位置($1)和長度($2 - $1 + 1)。

+0

謝謝。解釋非常感謝! – user1308144

+0

我很樂意幫忙! :) –

1

Nya給你awk解決方案,這裏的一個基於coreutils

AATTGTGAAGGTAGATGGCTCGCTCCGCGGCGGGGCGCGCGCGCGCGCGCGGGCTCGCTATATAGAGATATATGCGCGCGGCGCGCGGCGCGCGCGGCGCGCGCGTATATATATAGGCGCGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAAAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCCCCCCCCCCC 

offlen

5 10 
13 20 
22 24 

你可以得到你想要的輸出:

while read off len; do cut -c${off}-${len} string; done < offlen 

輸出:

GTGAAG 
AGATGGCT 
GCT