使用JSoup獲取Google搜索結果

我試圖通過JSoup獲取Google搜索結果列表。我目前使用的方法對第一頁（n）工作得非常好，但對於n + 1頁，它不能很好地工作。這是我應得的第一頁：使用JSoup獲取Google搜索結果

doc = Jsoup.connect(search).userAgent("Chrome").get(); 
links = doc.getElementsByClass("r");

搜索字符串將包含這樣的內容：https://www.google.com/search?q=apple第一頁。然後我的代碼看起來像這樣爲N + 1頁：

for(int i = 1; i <= pages; i++){ 
    search = "https://www.google.com/#q=" + keyword + "&start=" + (i*10); 
    doc = Jsoup.connect(search).userAgent("Mozilla").get(); 
    links.addAll(doc.getElementsByClass("r")); 
}

爲N + 1次的網頁搜索將類似於：https://www.google.com/#q=apple&start=10。我遇到的主要問題是doc.getElementsByClass("r")在n + 1搜索中不包含任何元素。這意味着類別r在JSoup返回時不存在。我通過搜索doc.toString()進行驗證。有沒有人有什麼建議？

謝謝！

來源

2017-05-15 Tommy

我建議你不要分析搜索結果，而是使用API。 – Kayaman

@Kayaman哪一個？ –

@Kayaman另外，爲了我的研究目的，我想不使用API。有關如何解析HTML的任何建議？ – Tommy

public class googleResults { 

    public static void main(String[] args) { 
     System.setProperty("webdriver.gecko.driver", "C:\\geckodriver.exe"); 
     WebDriver driver = new FirefoxDriver(); 
     driver.manage().window().maximize(); 
     driver.get("https://www.google.com/"); 
     driver.manage().timeouts().implicitlyWait(60, TimeUnit.SECONDS); 
     driver.findElement(By.id("lst-ib")).sendKeys("search" + Keys.ENTER); 
     driver.manage().timeouts().implicitlyWait(60, TimeUnit.SECONDS); 
     printResults(driver); 
    } 

    public static void printResults(WebDriver driver) { 
     List<WebElement> searchResults = driver.findElements(By.className("r")); 
     for (WebElement searchResult : searchResults) { 
      System.out.println(searchResult.getText()); 
     } 
     driver.findElement(By.id("pnnext")).click(); 
     /*Limit number of calls*/ 
     driver.manage().timeouts().implicitlyWait(60, TimeUnit.SECONDS); 
     printResults(driver); 
    } 
}

來源

2017-05-19 09:01:11

使用JSoup獲取Google搜索結果

回答

相關問題