htmlunit無法從undefined讀取屬性「推」

我試圖抓取使用htmlunit的網站。每當我運行它，雖然它只是輸出下列錯誤：htmlunit無法從undefined讀取屬性「推」

Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot read property "push" from undefined (https://www.kinoheld.de/dist/prod/0.4.7/widget.js#1)

現在我不知道很多關於JS，但我讀了push是某種數組的操作。這對我來說似乎是標準的，我不知道爲什麼它不會被htmlunit支持。

這裏是代碼我使用至今：

public static void main(String[] args) throws IOException { 
    WebClient web = new WebClient(BrowserVersion.FIREFOX_45); 
    web.getOptions().setUseInsecureSSL(true); 
    String url = "https://www.kinoheld.de/kino-muenchen/royal-filmpalast/vorstellung/280823/?mode=widget&showID=280828#panel-seats"; 
    web.getOptions().setThrowExceptionOnFailingStatusCode(false); 
    web.waitForBackgroundJavaScript(9000); 
    HtmlPage response = web.getPage(url); 

    System.out.println(response.getTitleText()); 
}

我缺少什麼？有沒有辦法解決這個問題呢？在此先感謝！

來源

2016-11-17 Maverick283

如果不支持，我想你應該向開發人員申請一個新功能。 –

何時發生錯誤？在'web.getPage（url）'或者'response.getTitleText（）'調用之後？ – Jack

@Jack'web.getPage（url）'後出現錯誤，因爲我可以註釋掉'response.getTitleText（）'並且它仍然會被拋出，即使當'web.getOptions（）。setThrowExceptionOnScriptError（false ）;'（見下面的答案）被插入。 – Maverick283

我以前也遇到過類似的問題。這是HTML單元被設計爲測試工具框架而不是網頁抓取的問題。您是否運行最新版本的HTML單元？

我能夠加入兩個setThrowExceptionOnScriptError(false)（如咖啡轉換器的答覆中提到）線，以及在該方法的頂部添加 java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); 禁用日誌轉儲到運行代碼。這產生的輸出：

Royal Filmpalast München München | kinoheld.de

完整代碼如下：

public static void main(String[] args) throws IOException { 

    java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); 

    WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45); 
    String url = "https://www.kinoheld.de/kino-muenchen/royal-filmpalast/vorstellung/280823/?mode=widget&showID=280828#panel-seats"; 

    webClient.getOptions().setUseInsecureSSL(true); 
    webClient.getOptions().setThrowExceptionOnScriptError(false); 
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); 
    webClient.waitForBackgroundJavaScript(9000); 
    HtmlPage response = webClient.getPage(url); 

    System.out.println(response.getTitleText()); 
}

這是在RedHat命令行與HTML單元2.2.1運行。希望這可以幫助。

來源

2016-11-23 15:09:49 Jack

嘗試增加

web.getOptions().setThrowExceptionOnScriptError(false);

您嘗試獲取頁面之前。這迫使htmlunit忽略錯誤。但是，這可能無法100％的時間，例如，如果引發錯誤的JavaScript是非常重要的，以獲取您要廢棄的數據（它希望不是）。如果這不起作用，請嘗試在ChromeDriver或GhostDriver中使用Selenium。

Source

來源

2016-11-22 21:27:22

添加該行不起作用，它仍會拋出相同的錯誤，並且不會將我帶到任何地方......我會嘗試任何Selenium稍後的更多時間;） – Maverick283

但是在原始異常處於堆棧之前用你建議的那一行跟蹤，它現在說'com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify'，然後打印剩下的堆棧跟蹤。 – Maverick283

我真的希望我可以分50分，而@傑克的答案確實解決了問題，你的建議可能會對我的遠射更有幫助... – Maverick283

htmlunit無法從undefined讀取屬性「推」

回答

相關問題