2017-05-02 63 views
0

我想從一個網頁(http://steamcommunity.com/id/Winning117/games/?tab=all)使用特定的標籤,但我一直得到空數據。我期望的結果是獲得特定遊戲「Cluckles的冒險」在這種情況下的「小時數」。任何幫助表示讚賞,感謝:)Jsoup ID選擇不起作用

import org.jsoup.Jsoup; 
import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element; 

public class TestScrape { 
    public static void main(String[] args) throws Exception { 
     String url = "http://steamcommunity.com/id/Winning117/games/?tab=all"; 
     Document document = Jsoup.connect(url).get(); 

     Element playTime = document.select("div#game_605250").first(); 
     System.out.println(playTime); 
    } 
} 

編輯:我怎樣才能知道如果網頁使用JavaScript,因此無法通過Jsoup解析?

+0

試試這個:playTime.select(」 .ellipsis hours_played「) – thanga

+0

@thanga你建議用哪一行代碼替換哪部分?抱歉,我剛剛在昨天開始使用Jsoup,所以我對它不熟悉 –

+1

您正在尋找的數據是使用一些JavaScript代碼動態加載的,因此您無法使用Jsoup進行修改。您需要使用無頭瀏覽器,如PhantomJS。 – TDG

回答

1

爲了有執行硒在Java代碼中的javascript:

硒的webdriver使利用每個 瀏覽器的原生支持的瀏覽器直接調用用於自動化。

要與Maven使用包括它這種依賴性:

<dependency> 
      <groupId>org.seleniumhq.selenium</groupId> 
      <artifactId>selenium-server</artifactId> 
      <version>3.4.0</version> 
     </dependency> 

接下來我給大家簡單的JUnit測試的代碼創建實例的webdriver和去定的url和執行簡單的腳本來獲得rgGames。 文件chromedriver你必須在https://sites.google.com/a/chromium.org/chromedriver/downloads下載。

package SeleniumProject.selenium; 

import java.io.File; 
import java.io.IOException; 
import java.util.ArrayList; 
import java.util.Map; 

import org.junit.After; 
import org.junit.AfterClass; 
import org.junit.Before; 
import org.junit.BeforeClass; 
import org.junit.Test; 
import org.junit.runner.RunWith; 
import org.junit.runners.JUnit4; 
import org.openqa.selenium.By; 
import org.openqa.selenium.JavascriptExecutor; 
import org.openqa.selenium.WebDriver; 
import org.openqa.selenium.WebElement; 
import org.openqa.selenium.chrome.ChromeDriverService; 
import org.openqa.selenium.chrome.ChromeOptions; 
import org.openqa.selenium.remote.DesiredCapabilities; 
import org.openqa.selenium.remote.RemoteWebDriver; 
import org.openqa.selenium.support.ui.ExpectedCondition; 
import org.openqa.selenium.support.ui.WebDriverWait; 

import junit.framework.TestCase; 

@RunWith(JUnit4.class) 
public class ChromeTest extends TestCase { 

    private static ChromeDriverService service; 
    private WebDriver driver; 

    @BeforeClass 
    public static void createAndStartService() { 
     service = new ChromeDriverService.Builder() 
       .usingDriverExecutable(new File("D:\\Downloads\\chromedriver_win32\\chromedriver.exe")) 
       .withVerbose(false).usingAnyFreePort().build(); 
     try { 
      service.start(); 
     } catch (IOException e) { 
      System.out.println("service didn't start"); 
      // TODO Auto-generated catch block 
      e.printStackTrace(); 
     } 
    } 

    @AfterClass 
    public static void createAndStopService() { 
     service.stop(); 
    } 

    @Before 
    public void createDriver() { 
     ChromeOptions chromeOptions = new ChromeOptions(); 
     DesiredCapabilities capabilities = DesiredCapabilities.chrome(); 
     capabilities.setCapability(ChromeOptions.CAPABILITY, chromeOptions); 
     driver = new RemoteWebDriver(service.getUrl(), capabilities); 
    } 

    @After 
    public void quitDriver() { 
     driver.quit(); 
    } 

    @Test 
    public void testJS() { 
     JavascriptExecutor js = (JavascriptExecutor) driver; 

     // Load a new web page in the current browser window. 
     driver.get("http://steamcommunity.com/id/Winning117/games/?tab=all"); 

     // Executes JavaScript in the context of the currently selected frame or 
     // window. 
     ArrayList<Map> list = (ArrayList<Map>) js.executeScript("return rgGames;"); 
     // Map represent properties for one game 
     for (Map map : list) { 
      for (Object key : map.keySet()) { 
       // take each key to find key "name" and compare its vale to 
       // Cluckles' Adventure 
       if (key instanceof String && key.equals("name") && map.get(key).equals("Cluckles' Adventure")) { 
        // print all properties for game Cluckles' Adventure 
        map.forEach((key1, value) -> { 
         System.out.println(key1 + " : " + value); 
        }); 
       } 
      } 
     } 
    } 
} 

正如你可以在

driver.get("http://steamcommunity.com/id/Winning117/games/?tab=all"); 

看到硒負載頁面,並通過Winning117得到的所有遊戲數據,它返回rgGames變量:

ArrayList<Map> list = (ArrayList<Map>) js.executeScript("return rgGames;"); 
0

試試這個:

public class TestScrape { 
    public static void main(String[] args) throws Exception { 
     String url = "http://steamcommunity.com/id/Winning117/games/?tab=all"; 
     Document document = Jsoup.connect(url).get(); 

     Element playTime = document.select("div#game_605250"); 
     Elements val = playTime.select(".hours_played"); 
     System.out.println(val.text()); 

    } 
} 
+0

它會在TestScrape.main(TestScrape.java:12)'第12行爲「Elements val =」中拋出''Exception in thread'main'java.lang.NullPointerException \t'等 –

+0

修改了答案。請檢查它。應該文字() – thanga

+0

我沒有環境來測試這個。改變了班級名稱。現在檢查 – thanga

1

要刮的頁面負載被JS,並沒有說jsoup get.All DATAS通過使用JS在頁面寫入任何#game_605250元素。

但是,當我將文檔打印到文件時,我看到一些數據是這樣的:

<script language="javascript"> 
     var rgGames = [{"appid":224260,"name":"No More Room in Hell","logo":"http:\/\/cdn.steamstatic.com.8686c.com\/steamcommunity\/public\/images\/apps\/224260\/670e9aba35dc53a6eb2bc686d302d357a4939489.jpg","friendlyURL":224260,"availStatLinks":{"achievements":true,"global_achievements":true,"stats":false,"leaderboards":false,"global_leaderboards":false},"hours_forever":"515","last_played":1492042097},{"appid":241540,"name":"State of Decay","logo":"http:\/\/.... 

然後,你可以通過一些的StringTools提取「rgGames」並格式化成JSON OBJ。

It't不是clerver方法,但它的工作