2012-10-21 99 views
10

我正在嘗試爲網站下載xls文件。當我點擊鏈接下載文件時,我得到一個javascript確認框。我如下處理它使用HtmlUnit下載文件

ConfirmHandler okHandler = new ConfirmHandler(){ 
      public boolean handleConfirm(Page page, String message) { 
       return true; 
      } 
     }; 
    webClient.setConfirmHandler(okHandler); 

有一個鏈接到下載文件。

<a href="./my_file.php?mode=xls&amp;w=d2hlcmUgc2VsbElkPSd3b3JsZGNvbScgYW5kIHN0YXR1cz0nV0FJVERFTEknIGFuZCBkYXRlIDw9IC0xMzQ4MTUzMjAwICBhbmQgZGF0ZSA%2BPSAtMTM1MDgzMTU5OSA%3D" target="actionFrame" onclick="return confirm('Do you want do download XLS file?')"><u>Download</u></a> 

我點擊使用

HTMLPage x = webClient.getPage("http://working.com/download"); 
HtmlAnchor anchor = (HtmlAnchor) x.getFirstByXPath("//a[@target='actionFrame']"); 
anchor.click(); 

handeConfirm()方法被excuted的鏈接。但我不知道如何從服務器保存文件流。我試圖用下面的代碼查看流。

anchor.click().getWebResponse().getContentAsString(); 

但是,結果與頁面x相同。任何人都知道如何從服務器捕獲流?謝謝。

+0

'''anchor.click()'將返回一個頁面。這應該contian你的XLS文件 – Lee

+0

看到我的答案類似的問題在http://stackoverflow.com/a/28471835/612123 – culmat

回答

7

我找到了一種使用WebWindowListener獲取InputStream的方法。在webWindowContentChanged(WebWindowEvent事件)的內部,我把下面的代碼。

InputStream xls = event.getWebWindow().getEnclosedPage().getWebResponse().getContentAsStream(); 

當我得到xls後,我可以將文件保存到我的硬盤上。

+0

我正在下載一個csv文件,你可以請解釋什麼是事件,你什麼時候調用點擊事件在主播。我沒有下載文件的確認框。 – Naveen

8

我是根據你的帖子創建的。注意:你可以改變內容類型的條件,只下載特定類型的文件。例如(application/octect-stream,application/pdf等)。

package net.s4bdigital.export.main; 

import java.io.File; 
import java.io.FileOutputStream; 
import java.io.IOException; 
import java.io.InputStream; 
import java.io.OutputStream; 
import java.util.List; 

import org.junit.Before; 
import org.junit.Test; 
import org.openqa.selenium.By; 
import org.openqa.selenium.WebDriver; 
import org.openqa.selenium.htmlunit.HtmlUnitDriver; 

import com.gargoylesoftware.htmlunit.ConfirmHandler; 
import com.gargoylesoftware.htmlunit.Page; 
import com.gargoylesoftware.htmlunit.WebClient; 
import com.gargoylesoftware.htmlunit.WebResponse; 
import com.gargoylesoftware.htmlunit.WebWindowEvent; 
import com.gargoylesoftware.htmlunit.WebWindowListener; 
import com.gargoylesoftware.htmlunit.util.NameValuePair; 

public class HtmlUnitDownloadFile { 

    protected String baseUrl; 
    protected static WebDriver driver; 

    @Before 
    public void openBrowser() { 
     baseUrl = "http://localhost/teste.html"; 
     driver = new CustomHtmlUnitDriver(); 
     ((HtmlUnitDriver) driver).setJavascriptEnabled(true); 

    } 


    @Test 
    public void downloadAFile() throws Exception { 

     driver.get(baseUrl); 
     driver.findElement(By.linkText("click to Downloadfile")).click(); 

    } 

    public class CustomHtmlUnitDriver extends HtmlUnitDriver { 

      // This is the magic. Keep a reference to the client instance 
      protected WebClient modifyWebClient(WebClient client) { 


      ConfirmHandler okHandler = new ConfirmHandler(){ 
        public boolean handleConfirm(Page page, String message) { 
         return true; 
        } 
      }; 
      client.setConfirmHandler(okHandler); 

      client.addWebWindowListener(new WebWindowListener() { 

       public void webWindowOpened(WebWindowEvent event) { 
        // TODO Auto-generated method stub 

       } 

       public void webWindowContentChanged(WebWindowEvent event) { 

        WebResponse response = event.getWebWindow().getEnclosedPage().getWebResponse(); 
        System.out.println(response.getLoadTime()); 
        System.out.println(response.getStatusCode()); 
        System.out.println(response.getContentType()); 

        List<NameValuePair> headers = response.getResponseHeaders(); 
        for(NameValuePair header: headers){ 
         System.out.println(header.getName() + " : " + header.getValue()); 
        } 

        // Change or add conditions for content-types that you would to like 
        // receive like a file. 
        if(response.getContentType().equals("text/plain")){ 
         getFileResponse(response, "target/testDownload.war"); 
        } 



       } 

       public void webWindowClosed(WebWindowEvent event) { 



       } 
      });   

      return client; 
      } 


    } 

    public static void getFileResponse(WebResponse response, String fileName){ 

     InputStream inputStream = null; 

     // write the inputStream to a FileOutputStream 
     OutputStream outputStream = null; 

     try {  

      inputStream = response.getContentAsStream(); 

      // write the inputStream to a FileOutputStream 
      outputStream = new FileOutputStream(new File(fileName)); 

      int read = 0; 
      byte[] bytes = new byte[1024]; 

      while ((read = inputStream.read(bytes)) != -1) { 
       outputStream.write(bytes, 0, read); 
      } 

      System.out.println("Done!"); 

     } catch (IOException e) { 
      e.printStackTrace(); 
     } finally { 
      if (inputStream != null) { 
       try { 
        inputStream.close(); 
       } catch (IOException e) { 
        e.printStackTrace(); 
       } 
      } 
      if (outputStream != null) { 
       try { 
        // outputStream.flush(); 
        outputStream.close(); 
       } catch (IOException e) { 
        e.printStackTrace(); 
       } 

      } 
     } 

    } 

} 
+1

我很抱歉,但我沒有得到它,你在哪裏或如何在'modifywebclient'方法中保留對'webclient'的引用......謝謝 –

+1

https://selenium.googlecode.com/svn/trunk/文檔/ API/JAVA /組織/ openqa /硒/的HtmlUnit/HtmlUnitDriver.html#modifyWebClient(com.gargoylesoftware.htmlunit.WebClient) Anudeep Samaiya 是超類的方法。我們可以覆蓋它添加一個句柄確認下載文件的窗​​口..但你需要修改內容類型等待你的情況。 –

+0

真的,它確實有一個魔術......工作順利。 – viralpatel

2

如果你不想用Selenium來包裝HtmlUnit,那麼有一種更簡單的方法。只需向HtmlUnit的WebClient提供擴展的WebWindowListener即可。

您也可以使用Apache commons.io進行簡單的流式複製。

WebClient webClient = new WebClient(); 
webClient.addWebWindowListener(new WebWindowListener() { 
    public void webWindowOpened(WebWindowEvent event) { } 

    public void webWindowContentChanged(WebWindowEvent event) { 
     // Change or add conditions for content-types that you would 
     // to like receive like a file. 
     if (response.getContentType().equals("text/plain")) { 
      try { 
       IOUtils.copy(response.getContentAsStream(), new FileOutputStream("downloaded_file")); 
      } catch (FileNotFoundException e) { 
       e.printStackTrace(); 
      } catch (IOException e) { 
       e.printStackTrace(); 
      } 
     } 

    } 

    public void webWindowClosed(WebWindowEvent event) {} 
}); 
1
final WebClient webClient = new WebClient(BrowserVersion.CHROME); 
     webClient.getOptions().setTimeout(2000); 
     webClient.getOptions().setThrowExceptionOnScriptError(false); 
     webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); 
     webClient.waitForBackgroundJavaScript(2000); 

     //get General page 
     final HtmlPage page = webClient.getPage("http://your"); 

     //get Frame 
     final HtmlPage frame = ((HtmlPage) 
     page.getFrameByName("Frame").getEnclosedPage()); 

     webClient.setConfirmHandler(new ConfirmHandler() { 
      public boolean handleConfirm(Page page, String message) { 
       return true; 
      } 
     }); 

     //get element file 
     final DomElement file = mainFrame.getElementByName("File"); 

     final InputStream xls = file.click().getWebResponse().getContentAsStream(); 

     assertNotNull(xls); 
    } 
-1

圖出下載URL,並在列表刮它。從下載網址中,我們可以使用此代碼獲取整個文件。

try{ 
     String path = "your destination path"; 
     List<HtmlElement> downloadfiles = (List<HtmlElement>) page.getByXPath("the tag you want to scrape"); 
     if (downloadfiles.isEmpty()) { 
      System.out.println("No items found !"); 
     } else { 
      for (HtmlElement htmlItem : downloadfiles) { 
       String DownloadURL = htmlItem.getHrefAttribute(); 

       Page invoicePdf = client.getPage(DownloadURL); 
       if (invoicePdf.getWebResponse().getContentType().equals("application/pdf")) { 
        System.out.println("creatign PDF:"); 
        IOUtils.copy(invoicePdf.getWebResponse().getContentAsStream(), 
          new FileOutputStream(path + "file name")); 
       } 
      } 
     } 
    } catch (Exception e) { 
     e.printStackTrace(); 
    }