2011-09-28 66 views
1

我使用htmlunit庫來報廢Yellowpages.com網站。我想輸入搜索詞並點擊查找按鈕。但之後,我得到2頁:http://www.yellowpages.com/ny/sport?g=NY&q=Sporthttps://dealoftheday.yellowpages.com/join?ic=deal_pop-under_signup-v- 第一個是我想要的,第二個是彈出。 我有這樣的代碼:如何獲得正確的頁面?

public void getPage() throws FailingHttpStatusCodeException, MalformedURLException, IOException { 
     WebClient webClient = new WebClient(); 
     page = webClient.getPage("http://www.yellowpages.com"); 
     HtmlTextInput searchInput = (HtmlTextInput) page.getElementById("search-terms"); 
     searchInput.setText("Law"); 

     HtmlSubmitInput button = (HtmlSubmitInput) page.getElementById("search-submit"); 
     page = button.click(); 
     System.out.println(page.getTitleText()); 

    } 

此代碼打印:

Deal of the Day on YP.com - Join

但我想打印第一頁標題,那就是:

NY Sport | Sport in NY - YP.com

如何獲得第一頁?

編輯:添加行webClient.setPopupBlockerEnabled(true)後,我收到了很多警告,之後,我得到了異常。這裏是控制檯輸出的一部分:

Exception in thread "main" ======= EXCEPTION START ======== EcmaError: lineNumber=[56] column=[0] lineSource=[null] name=[TypeError] sourceName=[http://i2.ypcdn.com/webyp/javascripts/home_packaged.js?13455] message=[TypeError: Cannot call method "blur" of null (http://i2.ypcdn.com/webyp/javascripts/home_packaged.js?13455#56)] com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot call method "blur" of null (http://i2.ypcdn.com/webyp/javascripts/home_packaged.js?13455#56) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:601) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:531) at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:906) at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeEventListeners(EventListenersContainer.java:164) at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:223) at com.gargoylesoftware.htmlunit.javascript.host.Node.fireEvent(Node.java:686) at com.gargoylesoftware.htmlunit.html.HtmlElement$2.run(HtmlElement.java:885) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538) at com.gargoylesoftware.htmlunit.html.HtmlElement.fireEvent(HtmlElement.java:890) at com.gargoylesoftware.htmlunit.html.HtmlElement.fireEvent(HtmlElement.java:865) at com.gargoylesoftware.htmlunit.html.HtmlForm.submit(HtmlForm.java:108) at com.gargoylesoftware.htmlunit.html.HtmlSubmitInput.doClickAction(HtmlSubmitInput.java:77) at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1263) at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1214) at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1177) at YellowPages.getPage(YellowPages.java:39) at YellowPages.main(YellowPages.java:22) Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot call method "blur" of null (http://i2.ypcdn.com/webyp/javascripts/home_packaged.js?13455#56) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3772) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3750) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3778)

回答

2

聽起來像一個JS錯誤。禁用JS:

webClient.setJavaScriptEnabled(false); 

又是怎麼回事?

webClient.setThrowExceptionOnScriptError(false); 

如果使用的HtmlUnit 2.11+

+0

我已經找到解決方案了,無論如何謝謝。你說得對,webClient.setThrowExceptionOnScriptError(false)是解決方案。 –

1

你試過

webClient.setPopupBlockerEnabled(true) 

那麼你應該得到的只有一個頁面

+0

沒有添加webClient.getOptions(),我會盡力。 –

+0

我編輯過的問題。 –

1

沒有測試,但我想你可能是通過Web客戶端的頂級迭代窗口(使用WebClient.getTopLevelWindows()),請致電getEnclosedPage()並測試頁面的標題文本是否爲您要查找的文本。