使用正則表達式

及彼td標籤之間的內容是樣本數據使用正則表達式

<table class="sparql" border="1"> 
<tr> <th>abstract</th></tr> 
<tr> 
    <td> 
    Cologne is Germany&#39;s fourth-largest city, and is the 
    largest city both in 
    the German Federal State of North Rhine-Westphalia and within the 
    Rhine-Ruhr Metropolitan Area, one of the major European metropolitan 
    areas with more than ten million inhabitants."@en 
</td> 
</tr> 
</table>

，我試圖讓使用正則表達式<td>標籤之間的內容。我試過類似

<td>.*</td>

但是如何丟棄tags itselef？

來源

2012-03-20 user160820

使用組別'（。*）'然後取第一個。 – 2012-03-20 15:59:36

正則表達式（通常）不應該用於解析HTML。更好的方法是使用TagSoup將HTML解析爲有效的XML文檔，然後使用CF的XML函數提取所需的數據。 Ben Nadel最近在CF10上做了這個帖子，但我沒有看到有什麼理由不能在舊版本中使用它 - 你只需要自己抓住TagSoup庫，因爲它並沒有預先安裝到CF10 。他的博客文章在這裏：http://www.bennadel.com/blog/2341-ColdFusion-10-Parsing-Dirty-HTML-Into-Valid-XML-Documents.htm – 2012-03-20 17:48:23

正如@MisterJack指出的那樣，您需要使用子表達式來引用匹配。如果您使用的是REReplace()，那麼您可以使用\1（或\2等）作爲匹配的反向引用。如果您使用的是REFind()，那麼您需要將其與returnsubexpressions=true一起使用，並且它將返回struct與len和pos陣列作爲匹配值。我這樣做：

<!--- I use "?" below because we want to be lazy rather than greedy ---> 
<cfset the_match = REFind(the_content, "<td>(.*?)</td>", 1, true) /> 

<cfdump var="#the_match#" />

你應該看到一個結構len和pos陣列。它可能在每個數組中只有一個元素。爲了獲得匹配內容，你可以這樣做：

<cfset match_content = mid(the_content, pos[i], len[i]) />

希望這會有所幫助。

來源

2012-03-20 16:14:38

使用正則表達式

回答

相關問題