字符無法正確解碼使用Jsoup和PhantomJS

這事，我使用PhantomJS和硒在Python來渲染頁面，這是代碼：字符無法正確解碼使用Jsoup和PhantomJS

import sys, time 
from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 

path_to_chromedriver = 'C:\\..\\chromedriver' 

section = sys.argv[1] 
path = sys.argv[2] 
links = sys.argv[3] 

listOfLinks = [] 
file = open(links, 'r') 
for link in file: 
    listOfLinks.append(link) 

dr = webdriver.Chrome(executable_path = path_to_chromedriver) 

cont = 0 
for link in listOfLinks: 
    try: 
     dr.get(link) 

     # Wait. 
     element = WebDriverWait(dr, 20).until(
      EC.presence_of_element_located((By.CLASS_NAME, "_img-zoom")) 
     ) 

     time.sleep(1) 

     htmlPath = path + section + "_" + str(cont) + ".html" 

     # Write HTML. 
     file = open(htmlPath, 'w') 
     file.write(dr.page_source) 
     file.close() 

     cont = cont + 1 
    except: 
     print("Exception") 

dr.quit()

此代碼創建收到的鏈接的HTML作爲參數。

該文件由Jsoup用Java解析：

Document document = Jsoup.parse(file, "UTF-8");

然而，特殊字符爲「€」，「A」，「E」，「我」，等等，不能正確地解碼和他們被'？'取代。我該如何解決這個問題？

來源

2016-04-06 cuoka

Try Document document = Jsoup.parse（file，「ISO-8859-1」）; – Eritrean

@Uzochi是的，這工作！ – cuoka

溶液通過Uzochi

嘗試文獻文檔= Jsoup.parse（文件，「ISO-8859-1」）中找到;

來源

2016-04-08 08:10:40 Stephan

字符無法正確解碼使用Jsoup和PhantomJS

回答

相關問題