我有一長串需要生成報告的域名。該列表包含了一些IDN域名,雖然我知道如何將它們轉換在python在命令行:在python中將域名轉換爲idn
>>> domain = u"pfarmerü.com"
>>> domain
u'pfarmer\xfc.com'
>>> domain.encode("idna")
'xn--pfarmer-t2a.com'
>>>
我掙扎得到它從文本文件中的小腳本讀取數據的工作。
#!/usr/bin/python
import sys
infile = open(sys.argv[1])
for line in infile:
print line,
domain = unicode(line.strip())
print type(domain)
print "IDN:", domain.encode("idna")
print
我得到以下輸出:
$ ./idn.py ./test
pfarmer.com
<type 'unicode'>
IDN: pfarmer.com
pfarmerü.com
Traceback (most recent call last):
File "./idn.py", line 9, in <module>
domain = unicode(line.strip())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 7: ordinal not in range(128)
我也有嘗試:
#!/usr/bin/python
import sys
import codecs
infile = codecs.open(sys.argv[1], "r", "utf8")
for line in infile:
print line,
domain = line.strip()
print type(domain)
print "IDN:", domain.encode("idna")
print
這給了我:
$ ./idn.py ./test
Traceback (most recent call last):
File "./idn.py", line 8, in <module>
for line in infile:
File "/usr/lib/python2.6/codecs.py", line 679, in next
return self.reader.next()
File "/usr/lib/python2.6/codecs.py", line 610, in next
line = self.readline()
File "/usr/lib/python2.6/codecs.py", line 525, in readline
data = self.read(readsize, firstline=True)
File "/usr/lib/python2.6/codecs.py", line 472, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-5: unsupported Unicode code range
這裏是我的測試數據文件:
pfarmer.com
pfarmerü.com
我很清楚我現在需要了解unicode。
感謝,
彼得