2011-09-30 66 views
1

我有一個nvarchar(最大)列,我需要提取開放href標籤和關閉href標籤之間的一切。例如,如果我的專欄的內容,其中以下幾點:tsql子串或字符串操作

Here you can visit <a href="http://www.thisite.com">this link</a> or this 
<a href="http://www.newsite.com">new link</a>. this is just a test to find the right answer. 

然後我我查詢的結果應該是:

"<a href="http://www.thisite.com">this link</a>" 
"<a href="http://www.newsite.com">new link</a>" 

任何幫助,將不勝感激!

回答

1

你必須使用CLR用戶定義函數(在SQL Server 2005 +支持):

Regular Expressions Make Pattern Matching And Data Extraction Easier

+0

我同意CLR的使用,但正則表達式不能用*可靠*來解析html。對於一個很好的閱讀,看看[你不能解析HTML與正則表達式](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-標籤)... *

不能容納* –

+0

* HTML是一種足夠複雜的語言,它不能被正則表達式解析。即使Jon Skeet也不能使用正則表達式解析HTML * –

+0

我同意,但作者不需要**解析HTML **。由於問題很簡單,「在開啓href標籤和關閉href標籤之間提取所有內容」--KISS原則在這裏應該工作得很好。 –

1

同意,CLR解決方案應該會更快。 更多,我不認爲SQL Server應該完成這項任務。您可以編寫客戶端應用程序(VB.NET,C#等)或應該完成此任務的PowerShell腳本。

如果你想要一個T-SQL唯一的解決辦法(請閱讀上面的段落,再次),然後看看這個查詢(至少SQL Server 2005中):

CREATE TABLE dbo.TestData 
(
    ID INT IDENTITY(1,1) PRIMARY KEY 
    ,SomeText NVARCHAR(MAX) NOT NULL 
); 
INSERT dbo.TestData 
SELECT 'Here you can visit <a href="http://www.thisite.com">this link</a> or this <a href="http://www.newsite.com">new link</a>' 
UNION ALL 
SELECT '<div class="tagged"> 
<a href="https://stackoverflow.com/questions/tagged/string" class="post-tag">string</a>&nbsp; 
    <span class="item-multiplier">&times;&nbsp;16364</span><br> 
<a href="https://stackoverflow.com/questions/tagged/tsql" class="post-tag">tsql</a>&nbsp; 
    <span class="item-multiplier">&times;&nbsp;10304</span><br> 
<a href="https://stackoverflow.com/questions/tagged/substring" class="post-tag">substring</a><acronym title="as soon as possible">ASAP</acronym>'; 

WITH ParseAnchorTags 
AS 
(
SELECT a.ID 
     ,SUBSTRING(a.SomeText, CHARINDEX('<a ',a.SomeText), CHARINDEX('</a>',a.SomeText)-CHARINDEX('<a ',a.SomeText)+4) AS Txt 
     ,CHARINDEX('</a>',a.SomeText)+3 AS LastIndex 
FROM dbo.TestData a 
UNION ALL 
SELECT a.ID 
     ,SUBSTRING(a.SomeText, CHARINDEX('<a ',a.SomeText,prev.LastIndex+1), CHARINDEX('</a>',a.SomeText,prev.LastIndex+1)-CHARINDEX('<a ',a.SomeText,prev.LastIndex+1)+4) AS Txt 
     ,CHARINDEX('</a>',a.SomeText,prev.LastIndex+1)+3 AS LastIndex 
FROM dbo.TestData a 
INNER JOIN ParseAnchorTags prev ON a.ID=prev.ID 
AND  CHARINDEX('<a ',a.SomeText,prev.LastIndex+1) > 0 
) 
SELECT * 
FROM ParseAnchorTags cte 
ORDER BY cte.ID, cte.LastIndex; 

DROP TABLE dbo.TestData; 

結果:

ID   Txt 
----------- -------------------------------------------------------------------- 
1   <a href="http://www.thisite.com">this link</a> 
1   <a href="http://www.newsite.com">new link</a> 
2   <a href="https://stackoverflow.com/questions/tagged/string" class="post-tag">string</a> 
2   <a href="https://stackoverflow.com/questions/tagged/tsql" class="post-tag">tsql</a> 
2   <a href="https://stackoverflow.com/questions/tagged/substring" class="post-tag">substring</a> 
0
declare @a varchar(max) = 'Here you can visit <a href="http://www.thisite.com">this link</a> or this <a href="http://www.newsite.com">new link</a>. this is just a test to find the right answer. ' 

;with cte as 
(
select cast(1 as bigint) f, cast(1 as bigint) t 
union all 
select charindex('<a href=', @a, t), charindex('</a>', @a, charindex('<a href=', @a, t)) 
from cte where charindex('<a href=', @a, t) > 0 
) 
select substring(@a, f, t-f)+'</a>' from cte 
where t > 1