從C＃中的谷歌搜索中獲取鏈接＃

我想通過C＃編程在google中進行簡單搜索，該搜索將運行我選擇的查詢並檢索前50個鏈接。徹底搜索一個類似的工具\正確的API後，我意識到，他們中的大多數都是過時的。我的第一個嘗試是創建一個「簡單的HttpWebRequest」，並掃描收到的WebResponse中的「href =」，結果根本沒有獎勵（冗餘）並且非常令人沮喪。我確實有Google API，但我不確定如何將其用於此目的，但我知道每天有1000個限制。從C＃中的谷歌搜索中獲取鏈接＃

吉爾

來源

2011-03-03 gilibi

我有一個項目向Google發送請求並解析迴應。爲了跟蹤谷歌的標記變化，我們必須每年重寫解析模塊幾次。它很爛。儘管解析代碼通常只需要幾個小時。 – Snowbear 2011-03-03 11:16:24

@Snowbear，你使用HtmlAgility包進行解析嗎？ – 2011-03-03 11:21:20

@Shiv，不，它是一種遺留部分，仍然使用正則表達式。謝謝，提及這一點，下次我們將重新研究那個噩夢。 – Snowbear 2011-03-03 11:48:17

如果你走這條路，你應該使用HtmlAgility包爲您解析。但更好的方法是使用Google的API。看到這個帖子i need to know which of my url is indexed on google

至於使用HtmlAgility包裝一些代碼，我有一個職位上我的博客 Finding links on a Web page

來源

2011-03-03 11:13:29

這裏是工作代碼..顯然，你必須添加適當的形式和一些簡單的控制...

using HtmlAgilityPack; 
using System; 
using System.Collections.Generic; 
using System.ComponentModel; 
using System.Data; 
using System.Drawing; 
using System.IO; 
using System.Linq; 
using System.Net; 
using System.ServiceModel.Syndication; 
using System.Text; 
using System.Threading.Tasks; 
using System.Windows.Forms; 
using System.Xml; 

namespace Search 
{ 
    public partial class Form1 : Form 
    { 
     // load snippet 
     HtmlAgilityPack.HtmlDocument htmlSnippet = new HtmlAgilityPack.HtmlDocument(); 

     public Form1() 
     { 
      InitializeComponent(); 
     } 

     private void btn1_Click(object sender, EventArgs e) 
     { 
      listBox1.Items.Clear(); 
      StringBuilder sb = new StringBuilder(); 
      byte[] ResultsBuffer = new byte[8192]; 
      string SearchResults = "http://google.com/search?q=" + txtKeyWords.Text.Trim(); 
      HttpWebRequest request = (HttpWebRequest)WebRequest.Create(SearchResults); 
      HttpWebResponse response = (HttpWebResponse)request.GetResponse(); 

      Stream resStream = response.GetResponseStream(); 
      string tempString = null; 
      int count = 0; 
      do 
      { 
       count = resStream.Read(ResultsBuffer, 0, ResultsBuffer.Length); 
       if (count != 0) 
       { 
        tempString = Encoding.ASCII.GetString(ResultsBuffer, 0, count); 
        sb.Append(tempString); 
       } 
      } 

      while (count > 0); 
      string sbb = sb.ToString(); 

      HtmlAgilityPack.HtmlDocument html = new HtmlAgilityPack.HtmlDocument(); 
      html.OptionOutputAsXml = true; 
      html.LoadHtml(sbb); 
      HtmlNode doc = html.DocumentNode; 

      foreach (HtmlNode link in doc.SelectNodes("//a[@href]")) 
      { 
       //HtmlAttribute att = link.Attributes["href"]; 
       string hrefValue = link.GetAttributeValue("href", string.Empty); 
       if (!hrefValue.ToString().ToUpper().Contains("GOOGLE") && hrefValue.ToString().Contains("/url?q=") && hrefValue.ToString().ToUpper().Contains("HTTP://")) 
       { 
        int index = hrefValue.IndexOf("&"); 
        if (index > 0) 
        { 
         hrefValue = hrefValue.Substring(0, index); 
         listBox1.Items.Add(hrefValue.Replace("/url?q=", "")); 
        } 
       } 
      } 
     } 
    } 
}

來源

2015-01-04 20:19:06 Boduzapho

從C＃中的谷歌搜索中獲取鏈接＃

回答

相關問題