2012-10-17 72 views
0

線我已經使用此正則表達式正則表達式來提取包含術語

/(?:[^ .,;:]+[ .,;:]+){3}(?:term1|term2)(?:[ .,;:]+[^ .,;:]+){3}/gi 

以提取選擇的術語和與前後3個字。我想更改正則表達式,以便提取包含所選術語的行。該行將受\ n限制,但我也想修剪前導和尾隨空格。
如何改變正則表達式來做到這一點?

例如輸入:

This line, containing term2, I'd like to extract. 
     This line contains term13 and I'd like to ignore it 
    This line, on the other hand, contains term1, so let's keep it. 

輸出中會是

This line, containing term2, I'd like to extract. 
This line, on the other hand, contains term1, so let's keep it. 

見代碼下面將要改變。

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<title>Untitled Document</title> 
</head> 

<body> 
<script> 
var Input = " This line, containing term2, I'd like to extract." 
Input += "  This line contains term13 and I'd like to ignore it." 
Input += " This line, on the other hand, contains term1, so let's keep it." 

var matches = Input.match(/(?:[^ .,;:]+[ .,;:]+){3}(?:term1|term2)(?:[ .,;:]+[^ .,;:]+){3}/gi); 
var myMatches = "" 
    for (i=0;i<matches.length;i++) 
    { 
    myMatches += ("..." + matches[i] + "...\n"); //assign to variable 
    } 
    alert(myMatches) 
</script> 


</body> 
</html> 
+0

一個建議:你可以使用單詞邊界,如果你的話是不太可能包含特殊字符。 –

+0

所以...你需要什麼,是多餘的行包含'「term1」'或'「term2」'? – Passerby

回答

2

像Asad指出的那樣,您可以使用\ b作爲單詞邊界,這樣term1不會與term13匹配。

正則表達式:

^ *(.*\b(?:term1|term2)\b.*) *$ 

應該做你以後。您的比賽將在第一個(也是唯一)的捕獲組中進行。簡單地通過他們循環,你就完成了。

See it on rubular.

+0

這麼近,我可以品嚐它[:(](http://rubular.com/r/8ODOh4qFo5)),但我看到你的我沒有犯過一些錯誤。 –