重複提取兩個字符串之間的文本？（？AWK sed的？）

我有一個文件名爲 'plainlinks'，看起來像這樣：重複提取兩個字符串之間的文本？（？AWK sed的？）

13080. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94092-2012.gz 
13081. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94094-2012.gz 
13082. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94096-2012.gz 
13083. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94097-2012.gz 
13084. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94098-2012.gz 
13085. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94644-2012.gz 
13086. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94645-2012.gz 
13087. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94995-2012.gz 
13088. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-94996-2012.gz 
13089. ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/999999-96404-2012.gz

我需要產生輸出看起來像這樣：

999999-94092 
999999-94094 
999999-94096 
999999-94097 
999999-94098 
999999-94644 
999999-94645 
999999-94995 
999999-94996 
999999-96404

來源

2012-11-14 Mike Furlender

使用sed：

sed -E 's/.*\/(.*)-.*/\1/' plainlinks

輸出：

999999-94092 
999999-94094 
999999-94096 
999999-94097 
999999-94098 
999999-94644 
999999-94645 
999999-94995 
999999-94996 
999999-96404

要更改保存到文件使用-i選項：

sed -Ei 's/.*\/(.*)-.*/\1/' plainlinks

或保存到一個新的文件，然後重定向：

sed -E 's/.*\/(.*)-.*/\1/' plainlinks > newfile.txt

說明：

s/ # subsitution 
.* # match anything 
\/ # upto the last forward-slash (escaped to not confused a sed) 
(.*) # anything after the last forward-slash (captured in brackets) 
-  # upto a hypen 
.* # anything else left on line 
/ # end match; start replace 
\1 # the value captured in the first (only) set of brackets 
/ # end

來源

2012-11-14 19:44:41

由於一噸是做到了 –

假設格式保持一致，如您所描述的，您可以使用awk：

awk 'BEGIN{FS="[/-]"; OFS="-"} {print $7, $8}' plainlinks > output_file

輸出：

999999-94092 
999999-94094 
999999-94096 
999999-94097 
999999-94098 
999999-94644 
999999-94645 
999999-94995 
999999-94996 
999999-96404

說明：

awk讀取輸入文件一次在一個行，打破各行成「田」
'BEGIN{FS="[/-]"; OFS="-"}指定分隔符用於輸入行應該是/或-，還指定了輸出應該由-
{print $7, $8}'分隔告訴awk打印每一行的第7和第8場，在這種情況下999999和9xxxx
plainlinks是在輸入文件的名稱將去
> output_file輸出重定向到一個名爲output_file

來源

2012-11-14 19:46:26

只是shell的參數擴展文件：

while IFS= read -r line; do 
    tmp=${line##*noaa/} 
    echo ${tmp%-????.gz} 
done < plainlinks

來源

2012-11-14 19:54:26

只是爲了好玩。

awk -F\/ '{print substr($7,0,12)}' plainlinks

或grep

grep -Eo '[0-9]{6}-[0-9]{5}' plainlinks

來源

2012-11-14 20:02:13 matchew

+1爲更簡單的grep解決方案。 –

@sudo_o非常感謝，+1爲您的解決方案。爲了第一。 – matchew

同意，優雅的grep解決方案+1 + –

如果格式保持不變，不需要SED或AWK：

cat your_file | cut -d "/" -f 7- | cut -d "-" -f 1,2

來源

2012-11-15 01:36:47 jfg956

如果格式不一致，那麼sed和awk解決方案就會像這樣打破。 :) – Kaz

重複提取兩個字符串之間的文本？ （？AWK sed的？）

回答

相關問題

重複提取兩個字符串之間的文本？（？AWK sed的？）