2012-10-05 42 views
1

因此,在Linux的命令行中,我試圖搜索一些HTML代碼並僅打印代碼的動態部分。例如該代碼在文本文件中搜索並打印一行代碼中的某個單詞

<p><span class="RightSideLinks">Tel: 090 97543</span></p> 

我只想打印下一次97543不是090我搜索的代碼可能已經更改爲

<p><span class="RightSideLinks">Tel: 081 82827</span></p> 

的文件,我只想82827.的其餘代碼保持不變,只是電話號碼發生變化。

我可以使用grep來做到這一點嗎? 感謝

編輯:

有沒有可能把它用在這個代碼嗎?

<tr class="patFuncEntry"><td align="left" class="patFuncMark"><input type="checkbox" name="renew0" id="renew0" value="i1061700" /></td><td align="left" class="patFuncTitle"><label for="renew0"><a href="/record=p1234567~S0"> I just want to print this part. </a></label> 

什麼改變就是記錄號碼:p1234567~S0"和我要打印的文本。使用GNU grep

回答

1

一種方式:

grep -oP '(?<=Tel: .{3})[^<]+' file.txt 

file.txt實施例內容:

<p><span class="RightSideLinks">Tel: 090 97543</span></p> 
<p><span class="RightSideLinks">Tel: 081 82827</span></p> 

結果:

97543 
82827 

編輯:

(?<=Tel: .{3}) ## This is a positive lookbehind assertion, which to be 
       ## interpreted must be used with grep's Perl regexp flag, '-P'. 

Tel: .{3}  ## So this is what we're actually checking for; the phrase 'Tel: ' 
       ## followed by any character exactly three times followed by a 
       ## space. Since we're searching only for numbers you could write 
       ## 'Tel: [0-9]{3} ' instead. 

[^<]+   ## Grep's '-o' flag enables us to return exactly what we want, 
       ## rather than the whole line. Therefore this expression will 
       ## return any character except '<' any number of times. 

Putting it all together, we're asking grep to return any character except '<' 
any number of times if we can find 'Tel: .{3} ' immediately ahead of it. HTH. 
+0

謝謝,效果很好。 –

+0

你能解釋一下grep命令中發生了什麼嗎? –

+0

@ManExa:希望我已經在這裏解釋得很好,但是如果你想了解關於lookaround斷言的更多信息,請參見這裏的網頁(http://www.regular-expressions.info/lookaround.html)。無論如何,看看你怎麼去。如果有任何問題只是問。另外,如果你對這個答案感到滿意,爲什麼不接受它呢?乾杯。 – Steve

相關問題