2013-01-03 19 views
3

我努力讓自己在Python中的HTML實體編碼器/解碼器,其行爲類似於PHP的htmlentitieshtml_entity_decode工作,它通常工作作爲一個獨立的腳本:Python腳本將無法在自動密鑰

我輸入:

Lorem ÁÉÍÓÚÇÃOÁáéíóúção @#$%*()[]<>

python decode.py

輸出:

Lorem ÁÉÍÓÚÇÃOÁáéíóúção @#$%*()[]<>

現在,如果我運行它作爲一個自動密鑰腳本我得到這個錯誤:

Script name: 'html_entity_decode' 
Traceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/autokey/service.py", line 454, in execute 
    exec script.code in scope 
    File "<string>", line 40, in <module> 
    File "/usr/local/lib/python2.7/dist-packages/autokey/scripting.py", line 42, in send_keys 
    self.mediator.send_string(keyString.decode("utf-8")) 
    File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode 
    return codecs.utf_8_decode(input, errors, True) 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 6-12: ordinal not in range(128) 

我在做什麼錯?這裏的腳本:

import htmlentitydefs 
import re 

entity_re = re.compile(r'&(%s|#(\d{1,5}|[xX]([\da-fA-F]{1,4})));' % '|'.join(
    htmlentitydefs.name2codepoint.keys())) 

def html_entity_decode(s, encoding='utf-8'): 

    if not isinstance(s, basestring): 
     raise TypeError('argument 1: expected string, %s found' \ 
         % s.__class__.__name__) 

    def entity_2_unichr(matchobj): 
     g1, g2, g3 = matchobj.groups() 
     if g3 is not None: 
      codepoint = int(g3, 16) 
     elif g2 is not None: 
      codepoint = int(g2) 
     else: 
      codepoint = htmlentitydefs.name2codepoint[g1] 
     return unichr(codepoint) 

    if isinstance(s, unicode): 
     entity_2_chr = entity_2_unichr 
    else: 
     entity_2_chr = lambda o: entity_2_unichr(o).encode(encoding, 
                  'xmlcharrefreplace') 
    def silent_entity_replace(matchobj): 
     try: 
      return entity_2_chr(matchobj) 
     except ValueError: 
      return matchobj.group(0) 

    return entity_re.sub(silent_entity_replace, s) 

text = clipboard.get_selection() 
text = html_entity_decode(text) 
keyboard.send_keys("%s" % text) 

我發現它在吉斯特https://gist.github.com/607454,我不是作家。

回答

3

回想一下回溯的可能問題是,您將一個unicode字符串傳遞給keyboard.send_keys,該字符串需要UTF-8編碼的字符串。 autokey然後嘗試解碼你的字符串,因爲輸入是unicode而不是utf-8,所以失敗。這看起來像是一個autokey中的bug:它不應該嘗試解碼字符串,除非它們真的是普通(字節)sstring。

如果這個猜測是正確的,你應該可以通過確保你將一個unicode實例傳遞給send_keys來解決這個問題。嘗試是這樣的:

text = clipboard.get_selection() 
if isinstance(text, unicode): 
    text = text.encode('utf-8') 
text = html_entity_decode(text) 
assert isinstance(text, str) 
keyboard.send_keys(text) 

是不需要的斷言,但是是一個方便的完整性檢查,以確保html_entity_decode做正確的事。

+0

我得到了同樣的錯誤。 – braindamage

+0

我以錯誤的方式解碼/編碼方向。我已經更新了我的答案,以匹配send_keys的期望值:它需要一個utf-8編碼的字符串,目前您還沒有收到該字符串,因爲clipboard.get_selection正在爲您返回一個unicode字符串(在autokey下運行時)。 –

2

問題是的輸出:

clipboard.get_selection() 

是Unicode字符串。

解決問題更換:

text = clipboard.get_selection() 

由:

text = clipboard.get_selection().encode("utf8")