2016-01-05 23 views
4

我試圖使用requests庫將包含精美Unicode字符的文本片段發佈到Web服務。我正在使用Python 3.5。使用Python將unicode字符串發佈到Web服務請求庫

text = "Två dagar kvar" 
r = requests.post("http://json-tagger.herokuapp.com/tag", data=text) 
print(r.json() 

我得到一個UnicodeEncodeError,但我想不通,我在我的身邊做錯了什麼,對於請求的文檔只說說在Unicode的GET請求從我所看到的。

UnicodeEncodeError      Traceback (most recent call last) 
<ipython-input-125-3ebcae3d7918> in <module>() 
    19   print("cleaned : " + line) 
    20 
---> 21   r = requests.post("http://json-tagger.herokuapp.com/tag", data=line) 
    22   sentences = r.json()['sentences'] 
    23   for sentence in sentences: 

//anaconda/lib/python3.4/site-packages/requests/api.py in post(url, data, json, **kwargs) 
    105  """ 
    106 
--> 107  return request('post', url, data=data, json=json, **kwargs) 
    108 
    109 

//anaconda/lib/python3.4/site-packages/requests/api.py in request(method, url, **kwargs) 
    51  # cases, and look like a memory leak in others. 
    52  with sessions.Session() as session: 
---> 53   return session.request(method=method, url=url, **kwargs) 
    54 
    55 

//anaconda/lib/python3.4/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth,  timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 
    466   } 
    467   send_kwargs.update(settings) 
--> 468   resp = self.send(prep, **send_kwargs) 
    469 
    470   return resp 

//anaconda/lib/python3.4/site-packages/requests/sessions.py in send(self, request, **kwargs) 
    574 
    575   # Send the request 
--> 576   r = adapter.send(request, **kwargs) 
    577 
    578   # Total elapsed time of the request (approximately) 

//anaconda/lib/python3.4/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 
    374      decode_content=False, 
    375      retries=self.max_retries, 
--> 376      timeout=timeout 
    377    ) 
    378 

//anaconda/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries,  redirect, assert_same_host, timeout, pool_timeout, release_conn, **response_kw) 
    557    httplib_response = self._make_request(conn, method, url, 
    558             timeout=timeout_obj, 
--> 559             body=body, headers=headers) 
    560 
    561    # If we're going to release the connection in ``finally:``, then 

//anaconda/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout,  **httplib_request_kw) 
    351   # conn.request() calls httplib.*.request, not the method in 
    352   # urllib3.request. It also calls makefile (recv) on the socket. 
--> 353   conn.request(method, url, **httplib_request_kw) 
    354 
    355   # Reset the timeout for the recv() on the socket 

//anaconda/lib/python3.4/http/client.py in request(self, method, url, body, headers) 
    1086  def request(self, method, url, body=None, headers={}): 
    1087   """Send a complete request to the server.""" 
-> 1088   self._send_request(method, url, body, headers) 
    1089 
    1090  def _set_content_length(self, body): 

//anaconda/lib/python3.4/http/client.py in _send_request(self, method, url, body, headers) 
    1123    # RFC 2616 Section 3.7.1 says that text default has a 
    1124    # default charset of iso-8859-1. 
-> 1125    body = body.encode('iso-8859-1') 
    1126   self.endheaders(body) 
    1127 

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 14-15: ordinal not in range(256) 

解決方法:我從 「表情符號」 塊刪除文本中的所有Unicode字符,U + 1F600 - U + 1F64F和符號和象形」塊,U + 1F300 - 根據this U + 1F5FF用下面的代碼回答,因爲我並不需要表情和圖片進行了分析:

text = re.sub(r'[^\u1F600-\u1F64F ]|[^\u1F300-\u1F5FF ]',"",text) 

UPDATE Web服務的創建者已經解決了這個問題,並更新了所有的文件,你需要做的就是在Python 3中發送編碼的字符串:

""Två dagar kvar".encode("utf-8") 
+0

您可以檢查請求庫是否可以使用除iso-8859-1之外的其他編碼進行請求。 (我猜它是這樣做的,如果不能這樣做,這些日子將會成爲一個瑣碎的限制)。對於您的解決方法,您需要刪除U + 00FF上方的所有字符,這隻會讓您留下一小部分拉丁字符集。 – roeland

回答

4

不清楚json-tagger.herokuapp.com期望的內容類型(這些例子是矛盾的)。你可以嘗試發佈數據,文本:

#!/usr/bin/env python 
import requests # pip install requests 

r = requests.post(url, 
        data=text.encode('utf-8'), 
        headers={'Content-type': 'text/plain; charset=utf-8'}) 
print(r.json()) 

或者你可以嘗試把它發送的application/x-www-form-urlencoded

#!/usr/bin/env python 
import requests # pip install requests 

r = requests.post(url, data=dict(data=text)) 
print(r.json()) 

服務器可能會拒絕兩者都接受,接受一個而不是其他,或期待其他一些格式(例如,application/json)等。

+0

data = dict(data = text))訣竅! – mattiasostmar

相關問題