2011-11-08 85 views
4

我在Java中使用HtmlUnit時出現了這個奇怪的問題。我用它來從網站下載一些數據,這個過程是這樣的:HtmlUnit在下載文件後無法檢索頁面

1 - 登錄

2 - 對於每一個元素(汽車)

----- 3搜索從鏈接

----- 4下載的zip文件中的代碼:

創建Web客戶端的:

webClient = new WebClient(BrowserVersion.FIREFOX_3_6); 
webClient.setJavaScriptEnabled(true); 
webClient.setThrowExceptionOnScriptError(false); 
DefaultCredentialsProvider provider = new DefaultCredentialsProvider(); 
provider.addCredentials(USERNAME, PASSWORD); 
webClient.setCredentialsProvider(provider); 
webClient.setRefreshHandler(new ImmediateRefreshHandler()); 

登錄:

public void login() throws IOException 
    { 
    page = (HtmlPage) webClient.getPage(URL); 
    HtmlForm form = page.getFormByName("formLogin"); 

    String user = USERNAME; 
    String password = PASSWORD; 

    // Enter login and password 
    form.getInputByName("LoginSteps$UserName").setValueAttribute(user); 
    form.getInputByName("LoginSteps$Password").setValueAttribute(password); 

    // Click Login Button 
    page = (HtmlPage) form.getInputByName("LoginSteps$LoginButton").click(); 

    webClient.waitForBackgroundJavaScript(3000); 

    // Click on Campa area 
    HtmlAnchor link = (HtmlAnchor) page.getElementById("ctl00_linkCampaNoiH"); 
    page = (HtmlPage) link.click(); 

    webClient.waitForBackgroundJavaScript(3000); 
    System.out.println(page.asText()); 
    } 

在網站搜索車:

private void searchCar(String _regNumber) throws IOException 
{ 
// Open search window 
page = page.getElementById("search_gridCampaNoi").click(); 

webClient.waitForBackgroundJavaScript(3000); 

// Write plate number 
HtmlInput element = (HtmlInput) page.getElementById("jqg1"); 
element.setValueAttribute(_regNumber); 

webClient.waitForBackgroundJavaScript(3000); 

// Click on search 
HtmlAnchor anchor = (HtmlAnchor) page.getByXPath("//*[@id=\"fbox_gridCampaNoi_search\"]").get(0); 
page = anchor.click(); 

webClient.waitForBackgroundJavaScript(3000); 
System.out.println(page.asText()); 
} 

下載PDF:

try 
    { 
     InputStream is = _link.click().getWebResponse().getContentAsStream(); 
     File path = new File(new File(DOWNLOAD_PATH), _regNumber); 
     if (!path.exists()) 
     { 
     path.mkdir(); 
     } 
     writeToFile(is, new File(path, _regNumber + "_pdfs.zip")); 
    } 
    catch (Exception e) 
    { 
     e.printStackTrace(); 
    } 
    } 

問題:

第一輛工作好,PDF格式下載,但只要我尋找一個新的車,當我到達這條線:

page = page.getElementById("search_gridCampaNoi").click(); 

我得到這個例外:

Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.UnexpectedPage cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlPage 

調試後,我意識到,那一刻我做這樣的判斷:

InputStream is = _link.click().getWebResponse().getContentAsStream(); 

page.getElementById(「search_gridCampaNoi」)的返回類型。click()從HtmlPage更改爲WebResponse,因此我不再接收新頁面,而是再次接收已下載的文件。

幾個調試器的屏幕截圖顯示這一情況的:

第一次調用,返回類型確定:

enter image description here

第二個電話,返回類型改變,我不再收到HtmlPage:

enter image description here

在此先感謝!

回答

8

爲了防止有人遇到同樣的問題,我找到了解決方法。更改行:

InputStream is = _link.click().getWebResponse().getContentAsStream(); 

InputStream is = _link.openLinkInNewWindow().getWebResponse().getContentAsStream(); 

似乎這樣的伎倆。我現在在做幾次迭代時會遇到問題,有時會起作用,有時候不起作用,但至少我現在有些東西。