如何從Google HTML結果頁中提取鏈接？

我正在閱讀包含來自Google搜索結果的HTML代碼的文本文件。然後我解析它，然後嘗試提取與此代碼的鏈接：如何從Google HTML結果頁中提取鏈接？

FileReader in = new FileReader("A.txt"); 
BufferedReader p = new BufferedReader(in); 
while(p.readLine() != null) 
{ 
    String html = p.readLine(); 
    Document doc = Jsoup.parse(html); 
    Elements Link = doc.select("a[href"); 
    for(Element element :Link) 
    { 
    if(element != null) 
    { 
     System.out.println(element); 
    } 
    } 
}

但是我有很多非鏈接字符串。我如何顯示鏈接，而不是其他任何內容？

來源

2014-01-06 user3132730

根據這個[問題] [1]你問的是什麼對谷歌的TOS [1]：http://stackoverflow.com/questions/3727662/how-can-you-search-google-programmatically -java-api – farrellmr

你可以發佈你想解析的html代碼嗎？由於谷歌搜索結果頁面不包含您的結果作爲直接HTML錨點 – PopoFibo

正是我有這個問題，谷歌搜索結果HTML是不同的 – user3132730

請了一個完整的選擇重試，不僅是「一[HREF」：

Elements links = doc.select("a[href]"); // a with href

見Selector文件的全面支持 - 尤其是在右側的例子。

來源

2014-01-06 10:57:49

我只是想鏈接，但我有這麼多的垃圾郵件：\ – user3132730

如果鏈接在iframe ，你需要先選擇。 doc.select（「iframe」）會幫助你。 –

如何從Google HTML結果頁中提取鏈接？

回答

相關問題