2012-08-17 57 views
0

我已經颳了一個網頁,但我希望鏈接有有效的鏈接,並將點擊時跳轉到該鏈接頁面。如何屏幕刮,並把有效的href鏈接,將去那個鏈接

前刮的數據:每日1 - 進入我的頁面 - 狀態

我想要去我的頁面跳轉到任何鏈接在其HREF。

ex。實際的HTML我

<td><a href="javascript:jsFormAuth('summary.php?meetingid=40456&plusday=0');">Go to my Page</a></td> 

我需要它是這樣的:

<td><a href="http://somewebsite.com/tab/form/summary.php?meetingid=40456&plusday=0');">Go to my Page</a></td> 

這裏是我的代碼刮:

public string ScreenScrape() 
     { 
      string url = "http://somewebsite.com/tab/form/index.php"; 
      string strResult = ""; 

      WebResponse objResponse; 
      WebRequest objRequest = System.Net.HttpWebRequest.Create(url); 

      objResponse = objRequest.GetResponse(); 

      using (StreamReader sr = new StreamReader(objResponse.GetResponseStream())) 
      { 
       strResult = sr.ReadToEnd(); 
       // Close and clean up the StreamReader 
       sr.Close(); 
      } 
      var webGet = new HtmlAgilityPack.HtmlWeb(); 
      var doc = webGet.Load(url); 

      foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]")) 
      { 
       HtmlAttribute att = link.Attributes["href"]; 
       att.Value = "http://somewebsite.com/tab/form/"+att.Value; 
      } 


      return strResult; 
     } 

這裏是我試圖改變的鏈接,並刪除javascript字符串,但無法弄清楚如何到達正確的索引。另外,一旦我能夠改變,我如何將strResult(上面)中的每個href替換爲新的href?

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]")) 
    { 
     HtmlAttribute att = link.Attributes["href"]; 
     att.Value = "http://somewebsite.com/tab/form/" + .... 
    } 

任何人都可以幫我嗎?謝謝

回答

0

沒關係我知道了,但我知道HTML網址解析不是最好的方法(如果你有關於如何更好地解析它的建議,請這樣做)。現在,唯一的目標就是改變href鏈接,讓它繼續下去。

public string ScreenScrape() 
     { 
      string url = "http://somewebsite.com/tab/form/index.php"; 
      string strResult = ""; 

      WebResponse objResponse; 
      WebRequest objRequest = System.Net.HttpWebRequest.Create(url); 

      objResponse = objRequest.GetResponse(); 

      using (StreamReader sr = new StreamReader(objResponse.GetResponseStream())) 
      { 
       strResult = sr.ReadToEnd(); 
       // Close and clean up the StreamReader 
       sr.Close(); 
      } 
      var webGet = new HtmlAgilityPack.HtmlWeb(); 
      var doc = webGet.Load(url); 

      foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]")) 
      { 

       string removeString ="javascript:jsFormAuth('"; 
       string removeEnd = "');"; 
       HtmlAttribute att = link.Attributes["href"]; 
       String strUrl = HttpContext.Current.Request.Url.AbsoluteUri.Replace(att.XPath, "("); 
       string sub1 = att.Value.Replace(removeString,""); 
       string sub2 = sub1.Replace(removeEnd,""); 
       att.Value = "http://somewebsite.com/tab/form/" + sub2; 

      } 

      return doc.DocumentNode.InnerHtml; 

     }