2014-03-26 13 views
2

我的表中有一個數據列,並且在此列中可以有零,一個或多個沿着每行其他文本的URL。我想將這些網址提取到僅包含這些網址的新數據集中。我如何從文本(NVARCHAR(MAX))列中提取一個或多個URL

爲什麼?因爲我想將這些URL中的一些添加到我的數據庫中的阻止列表中以防止垃圾郵件。

例如,我有這樣的文字數據列:

httx://portugal-forex.com/ 
httx://phen375treatment.com/ 
httx://priligy2000.org/ 
And so on. 

我真的不知道從哪裏開始在這樣做:

hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV. 

然後我想在文本中的所有網址SQL。

+0

需要得到唯一的主域名像httx://portugal-forex.com/或者也可以是httx://portugal-forex.com/xxx?Page = 2 – Darka

+0

主域名足夠了 –

回答

2

這裏是一個例子。我從「httx://」搜索字符串到第一個「/」:

在任何情況下,您都需要逐行閱讀。

放置代碼到功能

CREATE FUNCTION Temporary.getLinksFromText (@Tekstas NVARCHAR(MAX)) 
RETURNS @Data TABLE(TheLink NVARCHAR(500)) 
AS 
BEGIN 

    DECLARE @FirstIndexOfChar INT, 
      @LastIndexOfChar INT, 
      @LengthOfStringBetweenChars INT , 
      @String VARCHAR(MAX) 

    SET @FirstIndexOfChar = CHARINDEX('httx://',@Tekstas,0) 

    WHILE @FirstIndexOfChar > 0 
    BEGIN 

     SET @String = '' 
     SET @LastIndexOfChar = CHARINDEX('/',@Tekstas,@FirstIndexOfChar+7) 
     SET @LengthOfStringBetweenChars = @LastIndexOfChar - @FirstIndexOfChar + 1 

     SET @String = SUBSTRING(@Tekstas,@FirstIndexOfChar,@LengthOfStringBetweenChars) 
     INSERT INTO @Data (TheLink) VALUES (@String); 

     SET @Tekstas = SUBSTRING(@Tekstas, @LastIndexOfChar, LEN(@Tekstas)) 
     SET @FirstIndexOfChar = CHARINDEX('httx://',@Tekstas, 0) 

    END 

    RETURN 
END 

創建一些測試數據:

CREATE TABLE #Data(weLink NVARCHAR(MAX)); 
INSERT INTO #Data VALUES 
('hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.'), 
('hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.') 

你可以這樣執行它(無光標)

SELECT allLinks.* 
FROM #Data AS D 
OUTER APPLY Temporary.getLinksFromText (D.weLink) AS allLinks 
+0

此解決方案的工作原理,但我必須創建一個cusor來處理所有不是最優的行。沒有光標可以做到嗎? =)Setbased代替。我想處理這個語句「SELECT Data FROM Paste WHERE paste.CaptchaOK = 0」。 –

+0

你的意思是paste.CaptchaOK = 0? – Darka

+0

我認爲你無論如何都需要一個一個去。檢查更新的答案。 – Darka

相關問題