0
我有抓住一個頁面上的標題標籤之間的數據的一些正則表達式代碼:經典ASP正則表達式小的變化
<%
Function UrlExists(sURL)
Dim objXMLHTTP
Dim thePage
Dim strPTitle
Dim blnReturnVal
Dim objRegExp
Dim strTitleResponse
'Create object
Set objXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
on error resume next
'Get the head
objXMLHTTP.Open "HEAD", sURL, false
objXMLHTTP.setRequestHeader "User-Agent", Request.ServerVariables("HTTP_HOST")
objXMLHTTP.Send ""
'404?
If Err.Number <> 0 or objXMLHTTP.status <> 200 then blnReturnVal = "0|404 Error" Else blnReturnVal = "1|"
objXMLHTTP.close
'If not 404
if left(blnReturnVal,1) = "1" then
'Get the physical page
objXMLHTTP.Open "GET", sURL, false
objXMLHTTP.Send ""
thePage = objXMLHTTP.responseText
thePage = replace(thePage, vbCrlf, "")
objXMLHTTP.close
'Find title
Set objRegExp = New Regexp
objRegExp.IgnoreCase = true
objregexp.Multiline = true
objRegExp.Global = false
objRegExp.Pattern = "<title[^>]*?>(.*)</title>"
set strPTitle = objRegExp.Execute(thePage)
strTitleResponse = strPTitle.Item(0).Value
strTitleResponse = replace(strTitleResponse, vbCrlf, "")
strTitleResponse = trim(strTitleResponse)
if len(strTitleResponse) <1 OR strTitleResponse = "" then strTitleResponse = "(No Title)"
set objRegExp = nothing
strTitleResponse = replace(strTitleResponse,"</title>","")
strTitleResponse = replace(strTitleResponse,"<title>","")
strTitleResponse = replace(strTitleResponse,"'","' ")
blnReturnVal = blnReturnVal & strTitleResponse
end if
Set objXMLHTTP = nothing
UrlExists = blnReturnVal
End Function
%>
這工作得很好,並已爲許多個月,但是當我寫的(愚蠢?)我做了假設,每個頁面只有一個或沒有標題標籤。它最近開始對John Lewis page拋出奇怪的錯誤,因爲它在它的HTML兩項冠軍:
<title>John Lewis - Shop online at Britain's Favourite Retailer</title>
... bunch of html
<title>
</title>
如何修改正則表達式匹配只有第一配對,不感到困惑與上面的HTML?
很好,謝謝! – 2010-10-26 09:56:18
不客氣:) – jensgram 2010-10-26 10:50:17