UnicodeEncodeError在Python 3和BeautifulSoup4

當運行我的代碼，我得到這個錯誤UnicodeEncodeError在Python 3和BeautifulSoup4

UnicodeEncodeError: 'ascii' codec can't encode character '\u0303' in position 71: ordinal not in range(128)

這是我的全部代碼，

from urllib.request import urlopen as uReq 
from urllib.request import urlretrieve as uRet 
from bs4 import BeautifulSoup as soup 
import urllib 

for x in range(143, 608): 
    myUrl = "example.com/" + str(x) 
    try: 
     uClient = uReq(myUrl) 
     page_html = uClient.read() 
     uClient.close() 
     page_soup = soup(page_html, "html.parser") 

     container = page_soup.findAll("div", {"id": "videoPostContent"}) 

     img_container = container[0].findAll("img") 
     images = img_container[0].findAll("img") 

     imgCounter = 0 

     if len(images) == "": 
      for image in images: 
       print('Downloading image from ' + image['src'] + '...') 
       imgCounter += 1 
       uRet(image['src'], 'pictures/' + str(x) + '.jpg') 
     else: 
      for image in img_container: 
       print('Downloading image from ' + image['src'] + '...') 
       imgCounter += 1 
       uRet(image['src'], 'pictures/' + str(x) + '_' + str(imgCounter) + '.jpg') 
    except urllib.error.HTTPError: 
     continue

試圖解決方案：

我嘗試添加.encode/decode('utf-8')和.text.encode/decode('utf-8')到page_soup，但它給出了這個錯誤。

AttributeError: 'str'/'bytes' object has no attribute 'findAll' or

來源

2017-10-15 Axis

包裹你的urlretrieve電話在被拋出什麼行錯誤跳過它們？ –

將page_soup轉換爲字符串意味着它不再是一個BeatifulSoup對象。所以你不能使用'findAll'。什麼行會拋出錯誤？ – TheF1rstPancake

在uRet（）[這是urlretrieve] – Axis

至少有一個圖像src網址包含非ascii字符，且urlretrieve無法處理它們。

>>> url = 'http://example.com/' + '\u0303' 
>>> urlretrieve(url) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    ... 
UnicodeEncodeError: 'ascii' codec can't encode character '\u0303' in position 5: ordinal not in range(128)

您可以嘗試以下方法之一來解決此問題。

假定這些URL是有效的，並使用具有更好的Unicode處理，像requests圖書館檢索。
假設網址是有效的，但包含必須在傳遞到urlretrieve之前轉義的unicode字符。這將需要將URL分解爲方案，域，路徑等，引用路徑和任何查詢參數，然後解開分裂;所有的工具都在urllib.parse包中（但這可能是什麼請求，所以只是使用請求）。
假定這些URL被打破，並通過與try/except UnicodeEncodeError

來源

2017-10-15 08:34:03 snakecharmerb

UnicodeEncodeError在Python 3和BeautifulSoup4

回答

相關問題