2014-10-04 116 views
0

我想過濾所有的網站鏈接出谷歌搜索。如果我尋找某些東西,我想要獲取網站上的所有網站鏈接,Google會向我們展示這些鏈接。如何排除搜索結果(鏈接)從谷歌搜索在Java

首先我想要閱讀完整的html內容。之後我想過濾掉所有重要的網址。例如 - >如果我把「買鞋子」的話放進谷歌 - >我想獲得像「www.amazon.in/Shoes」等鏈接。

如果我開始我的節目,我只得到了幾個網址,只有Google爲基礎的網站,如「google.de/intl/de/options/」

PS:我檢查與相同的查詢頁面的源代碼( 「購買+鞋子」),並注意Chrome瀏覽器比firefox瀏覽器提供更多的內容。我的感覺是,我只能得到少數網站的結果,因爲java像Firefox瀏覽器那樣連接,不是嗎? 如何獲得所有這些鏈接,哪些谷歌顯示?

import java.io.BufferedReader; 
import java.io.BufferedWriter; 
import java.io.File; 
import java.io.FileWriter; 
import java.io.IOException; 
import java.io.InputStreamReader; 
import java.net.MalformedURLException; 
import java.net.URL; 
import java.net.URLConnection; 
import java.nio.charset.Charset; 
import java.util.Scanner; 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 
public class findEveryUrl { 
public static void main(String[] args) throws IOException 
{ 

    String gInput = "https://www.google.de/#q="; 
    // setKeyWord asks you to enter the keyword into the console 
    String fullUrl = gInput + setKeyWord(); 
    //fullUrl is used for the InputStream and "www." is the string, which is used for splitting 
    findAllSubs(fullUrl, "www."); 
    //System.out.println("given url: " + fullUrl); 
} 



/* 
* @param <T> String type. 
* @param urlString has to be the full Url. 
* @param splitphrase is the String which is used for splitting. 
* @return void 
*/ 
static void findAllSubs(String urlString, String splitphrase) 
{ 
    try 
    { 
     URL  url  = new URL(urlString); 
     URLConnection yc = url.openConnection(); 
     BufferedReader in = new BufferedReader(new InputStreamReader(
       yc.getInputStream())); 
     String inputLine ; 
     String array[]; 

     while ((inputLine = in.readLine()) != null){ 
      inputLine += in.readLine(); 
      array = inputLine.split(splitphrase); 
      arrayToConsol(array); 

     } 
    }catch (IOException e) { 
     e.printStackTrace(); 
    } 

} 



/* 
* urlQuery() asks you for the search keyword for the google query 
* @return returns the keyword, which you wrote into the console 
*/ 
public static String setKeyWord(){ 
    BufferedReader console = new BufferedReader(new InputStreamReader(System.in)); 
    System.out.print("Enter a KeyWord: "); 
    //googles search engine url 

    String keyWord = null; 
    try { 
     keyWord = console.readLine(); 
    } catch (IOException e) { 
     // shouldn't be happen 
     e.printStackTrace(); 
    } 

    return keyWord; 
} 

public static void arrayToConsol(String[] array){ 
    for (String item : array) { 
     System.out.println(item); 
    } 
} 

public static void searchQueryToConsole(String url) throws IOException{ 
    URL googleSearch = new URL(url); 
    URLConnection yc = googleSearch.openConnection(); 
    BufferedReader in = new BufferedReader(new InputStreamReader(
      yc.getInputStream())); 
    String inputLine; 
    while ((inputLine = in.readLine()) != null) 
     System.out.println(inputLine); 
    in.close(); 
}} 

回答

0

這裏是簡單和容易的解決方案。

http://www.programcreek.com/2012/05/call-google-search-api-in-java-program/

但是如果你想要解析使用CSS選擇器來查找元素JSoup其偉大的圖書館的其他頁面。

Document doc = Jsoup.connect("http://en.wikipedia.org/").get(); 
Elements newsHeadlines = doc.select("#mp-itn b a"); 
+0

謝謝Daredesm,爲你快速回復=) – 2014-10-05 20:10:08