2016-03-28 112 views
0

我想使用HtmlUnitDriver截取頁面我碰到這個Link這個人已經做出了一個自定義的HTML單元驅動程序截屏。 但不幸的是,雖然實施,我得到一個例外。無法使用HtmlUnitDriver screenscreen [Selenium WebDriver java]

「在線程異常 」「 java.lang.ClassCastException:[B不能在Test.main(Test.java:39)被鑄造成java.io.File 」

我的代碼是如下 -

import java.io.File; 
import java.io.IOException; 
import org.openqa.selenium.OutputType; 
import org.openqa.selenium.WebDriver; 
import com.gargoylesoftware.htmlunit.BrowserVersion; 

public class Test extends ScreenCaptureHtmlUnitDriver { 

    public static void main(String[] args) throws InterruptedException, IOException { 

     WebDriver driver = new ScreenCaptureHtmlUnitDriver(BrowserVersion.FIREFOX_38); 
     driver.get("https://www.google.com/?gws_rd=ssl"); 
     try{ 
     File scrFile = ((ScreenCaptureHtmlUnitDriver) driver).getScreenshotAs(OutputType.FILE); 
     FileUtils.copyFile(scrFile, new File("D:\\TEMP.PNG")); 
     }catch (Exception e) { 
      e.printStackTrace(); 
     } 
    } 
} 
其中我使用(一個其在鏈路)

駕駛員的HtmlUnit是這 -

import java.io.ByteArrayOutputStream; 
import java.io.IOException; 
import java.net.URL; 
import java.util.Collections; 
import java.util.HashMap; 
import java.util.Iterator; 
import java.util.LinkedList; 
import java.util.List; 
import java.util.Map; 
import java.util.regex.Matcher; 
import java.util.regex.Pattern; 
import java.util.zip.ZipEntry; 
import java.util.zip.ZipOutputStream; 
import org.apache.commons.io.FilenameUtils; 
import org.apache.commons.io.IOUtils; 
import org.openqa.selenium.Capabilities; 
import org.openqa.selenium.OutputType; 
import org.openqa.selenium.TakesScreenshot; 
import org.openqa.selenium.WebDriverException; 
import org.openqa.selenium.htmlunit.HtmlUnitDriver; 
import org.openqa.selenium.internal.Base64Encoder; 
import org.openqa.selenium.remote.CapabilityType; 
import org.openqa.selenium.remote.DesiredCapabilities; 
import com.gargoylesoftware.htmlunit.BrowserVersion; 
import com.gargoylesoftware.htmlunit.WebClient; 
import com.gargoylesoftware.htmlunit.WebRequest; 
import com.gargoylesoftware.htmlunit.WebWindow; 
import com.gargoylesoftware.htmlunit.html.HtmlElement; 
import com.gargoylesoftware.htmlunit.html.HtmlPage; 

public class ScreenCaptureHtmlUnitDriver extends HtmlUnitDriver implements TakesScreenshot { 

private static Map<String, byte[]> imagesCache = Collections.synchronizedMap(new HashMap<String, byte[]>()); 

private static Map<String, String> cssjsCache = Collections.synchronizedMap(new HashMap<String, String>()); 

// http://stackoverflow.com/questions/4652777/java-regex-to-get-the-urls-from-css 
private final static Pattern cssUrlPattern = Pattern.compile("background(-image)?[\\s]*:[^url]*url[\\s]*\\([\\s]*([^\\)]*)[\\s]*\\)[\\s]*");// ?<url> 

public ScreenCaptureHtmlUnitDriver() { 
    super(); 
} 

public ScreenCaptureHtmlUnitDriver(boolean enableJavascript) { 
    super(enableJavascript); 
} 

public ScreenCaptureHtmlUnitDriver(Capabilities capabilities) { 
    super(capabilities); 
} 

public ScreenCaptureHtmlUnitDriver(BrowserVersion version) { 
    super(version); 
    DesiredCapabilities var = ((DesiredCapabilities) getCapabilities()); 
    var.setCapability(CapabilityType.TAKES_SCREENSHOT, true); 
} 

//@Override 
@SuppressWarnings("unchecked") 
public <X> X getScreenshotAs(OutputType<X> target) throws WebDriverException { 
    byte[] archive = new byte[0]; 
    try { 
     archive = downloadCssAndImages(getWebClient(), (HtmlPage) getCurrentWindow().getEnclosedPage()); 
    } catch (Exception e) { 
    } 
    if(target.equals(OutputType.BASE64)){ 
     return target.convertFromBase64Png(new Base64Encoder().encode(archive)); 
    } 
    if(target.equals(OutputType.BYTES)){ 
     return (X) archive; 
    } 
    return (X) archive; 
} 

// http://stackoverflow.com/questions/2244272/how-can-i-tell-htmlunits-webclient-to-download-images-and-css 
protected byte[] downloadCssAndImages(WebClient webClient, HtmlPage page) throws Exception { 
    WebWindow currentWindow = webClient.getCurrentWindow(); 
    Map<String, String> urlMapping = new HashMap<String, String>(); 
    Map<String, byte[]> files = new HashMap<String, byte[]>(); 
    WebWindow window = null; 
    try { 
     window = webClient.getWebWindowByName(page.getUrl().toString()+"_screenshot"); 
     webClient.getPage(window, new WebRequest(page.getUrl())); 
    } catch (Exception e) { 
     window = webClient.openWindow(page.getUrl(), page.getUrl().toString()+"_screenshot"); 
    } 

    String xPathExpression = "//*[name() = 'img' or name() = 'link' and (@type = 'text/css' or @type = 'image/x-icon') or @type = 'text/javascript']"; 
    List<?> resultList = page.getByXPath(xPathExpression); 

    Iterator<?> i = resultList.iterator(); 
    while (i.hasNext()) { 
     try { 
      HtmlElement el = (HtmlElement) i.next(); 
      String resourceSourcePath = el.getAttribute("src").equals("") ? el.getAttribute("href") : el 
        .getAttribute("src"); 
      if (resourceSourcePath == null || resourceSourcePath.equals("")) 
       continue; 
      URL resourceRemoteLink = page.getFullyQualifiedUrl(resourceSourcePath); 
      String resourceLocalPath = mapLocalUrl(page, resourceRemoteLink, resourceSourcePath, urlMapping); 
      urlMapping.put(resourceSourcePath, resourceLocalPath); 
      if (!resourceRemoteLink.toString().endsWith(".css")) { 
       byte[] image = downloadImage(webClient, window, resourceRemoteLink); 
       files.put(resourceLocalPath, image); 
      } else { 
       String css = downloadCss(webClient, window, resourceRemoteLink); 
       for (String cssImagePath : getLinksFromCss(css)) { 
        URL cssImagelink = page.getFullyQualifiedUrl(cssImagePath.replace("\"", "").replace("\'", "") 
          .replace(" ", "")); 
        String cssImageLocalPath = mapLocalUrl(page, cssImagelink, cssImagePath, urlMapping); 
        files.put(cssImageLocalPath, downloadImage(webClient, window, cssImagelink)); 
       } 
       files.put(resourceLocalPath, replaceRemoteUrlsWithLocal(css, urlMapping) 
         .replace("resources/", "./").getBytes()); 
      } 
     } catch (Exception e) { 
     } 
    } 
    String pagesrc = replaceRemoteUrlsWithLocal(page.getWebResponse().getContentAsString(), urlMapping); 
    files.put("page.html", pagesrc.getBytes()); 
    webClient.setCurrentWindow(currentWindow); 
    return createZip(files); 
} 

String downloadCss(WebClient webClient, WebWindow window, URL resourceUrl) throws Exception { 
    if (cssjsCache.get(resourceUrl.toString()) == null) { 
     cssjsCache.put(resourceUrl.toString(), webClient.getPage(window, new WebRequest(resourceUrl)) 
       .getWebResponse().getContentAsString()); 

    } 
    return cssjsCache.get(resourceUrl.toString()); 
} 

byte[] downloadImage(WebClient webClient, WebWindow window, URL resourceUrl) throws Exception { 
    if (imagesCache.get(resourceUrl.toString()) == null) { 
     imagesCache.put(
       resourceUrl.toString(), 
       IOUtils.toByteArray(webClient.getPage(window, new WebRequest(resourceUrl)).getWebResponse() 
         .getContentAsStream())); 
    } 
    return imagesCache.get(resourceUrl.toString()); 
} 

public static byte[] createZip(Map<String, byte[]> files) throws IOException  { 
    ByteArrayOutputStream bos = new ByteArrayOutputStream(); 
    ZipOutputStream zipfile = new ZipOutputStream(bos); 
    Iterator<String> i = files.keySet().iterator(); 
    String fileName = null; 
    ZipEntry zipentry = null; 
    while (i.hasNext()) { 
     fileName = i.next(); 
     zipentry = new ZipEntry(fileName); 
     zipfile.putNextEntry(zipentry); 
     zipfile.write(files.get(fileName)); 
    } 
    zipfile.close(); 
    return bos.toByteArray(); 
} 

    List<String> getLinksFromCss(String css) { 
    List<String> result = new LinkedList<String>(); 
    Matcher m = cssUrlPattern.matcher(css); 
    while (m.find()) { // find next match 
     result.add(m.group(2)); 
    } 
    return result; 
} 

String replaceRemoteUrlsWithLocal(String source, Map<String, String> replacement) { 
    for (String object : replacement.keySet()) { 
     // background:url(http://org.com/images/image.gif) 
     source = source.replace(object, replacement.get(object)); 
    } 
    return source; 
} 

String mapLocalUrl(HtmlPage page, URL link, String path, Map<String, String> replacementToAdd) throws Exception { 
    String resultingFileName = "resources/" + FilenameUtils.getName(link.getFile()); 
    replacementToAdd.put(path, resultingFileName); 
    return resultingFileName; 
} 

} 

UPDATE

由Andrew提供的代碼工作 - 但我想知道是否有一種方式,我們可以只下載選定的資源。對於例如this網站,我想只下載captcha圖像那些id是「// * [@ id ='cimage']」,因爲下載所有資源需要很長時間。有沒有一種方法可以讓我們只下載特定的資源。因爲使用現有的代碼 低於所有資源下載。

byte[] zipFileBytes = ((ScreenCaptureHtmlUnitDriver) driver).getScreenshotAs(OutputType.BYTES); 
FileUtils.writeByteArrayToFile(new File("D:\\TEMP.PNG"), zipFileBytes); 
+0

您可以添加完整的異常堆棧,並告訴「B」的類型? –

+0

嗨Florent我編輯了代碼,並添加了printstacktrace的try catch,但我仍然收到「java.lang.ClassCastException:[B無法轉換爲java.io.File \t at Test.main(Test.java:19)」作爲堆棧跟蹤 – Ajay

+0

嗨是有必要使用HtmlUnitDriver如果不是PLZ去談幻影js它更好的條款當談話截圖 –

回答

1

錯誤說,該代碼試圖將一個byte[]轉換爲File。這很容易明白爲什麼你只是getScreenshotAs剔除未使用的路徑:

public <X> X getScreenshotAs(OutputType<X> target) throws WebDriverException { 
    byte[] archive = new byte[0]; 
    try { 
     archive = downloadCssAndImages(getWebClient(), (HtmlPage) getCurrentWindow().getEnclosedPage()); 
    } catch (Exception e) { 
    } 
    return (X) archive; 
} 

有沒有辦法,你可以得到一個File了這一點。 OutputType.FILE不支持,所以你必須自己處理文件輸出。幸運的是,這很容易。您可以將您的代碼更改爲:

byte[] zipFileBytes = ((ScreenCaptureHtmlUnitDriver) driver).getScreenshotAs(OutputType.BYTES); 
FileUtils.writeByteArrayToFile(new File("D:\\TEMP.PNG"), zipFileBytes); 

請參閱FileUtils.writeByteArrayToFile()瞭解更多信息。

+1

非常感謝你,安德魯,這很好:)我已經添加了一些更詳細的問題,我可以請你看看!謝謝一堆。 – Ajay

-3

檢查了這一點這可能對你有幫助

File scrFile = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE); 
FileUtils.copyFile(scrFile, new File("C:/Users/home/Desktop/screenshot.png"));// copy it somewhere 
+2

這並沒有解釋/處理實際的錯誤信息,誤解了問題以及OP試圖實現的目標,並將用另一個錯誤代替一個錯誤。 –

+0

這是拍攝當前網頁的最簡單方式 – monil

+0

這不是問題。 OP希望知道如何在自定義版本的HtmlUnitDriver中通過*「ClassCastException:[B無法轉換爲java.io.File」*,通常根本無法截取屏幕截圖。 –

相關問題