2016-08-14 160 views
0

我有以下代碼試圖打印圖形的邊緣列表。它看起來像邊是循環的,但我的意圖是測試是否所有邊都包含在進行進一步處理的函數中。擺脫unicode錯誤

def mapper_network(self, _, info): 
    info[0] = info[0].encode('utf-8') 
    for i in range(len(info[1])): 
     info[1][i] = str(info[1][i]) 
    l_lst = len(info[1]) 
    packed = [(info[0], l) for l in info[1]] #each pair of nodes (edge) 
    weight = [1 /float(l_lst)] #each edge weight 
    G = nx.Graph() 
    for i in range(len(packed)): 
     edge_from = packed[i][0] 
     edge_to = packed[i][1] 
     #edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore') 
     edge_to = edge_to.encode("utf-8") 
     weight = weight 
     G.add_edge(edge_from, edge_to, weight=weight) 
    #print G.size() #yes, this works :) 
    G_edgelist = [] 
    G_edgelist = G_edgelist.append(nx.generate_edgelist(G).next()) 
    print G_edgelist 

有了這個代碼,我得到錯誤

Traceback (most recent call last): 
File "MRQ7_trevor_2.py", line 160, in <module> 
MRMostUsedWord2.run() 
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 433, in run 
mr_job.execute() 
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 442, in execute 
self.run_mapper(self.options.step_num) 
File "/tmp/MRQ7_trevor_2.vagrant.20160814.201259.655269/job_local_dir/1/mapper/27/mrjob.tar.gz/mrjob/job.py", line 507, in run_mapper 
for out_key, out_value in mapper(key, value) or(): 
File "MRQ7_trevor_2.py", line 91, in mapper_network 
G_edgelist = G_edgelist.append(nx.generate_edgelist(G).next()) 
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/readwrite/edgelist.py", line 114, in generate_edgelist 
yield delimiter.join(map(make_str,e)) 
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/utils/misc.py", line 82, in make_str 
return unicode(str(x), 'unicode-escape') 
UnicodeDecodeError: 'unicodeescape' codec can't decode byte 0x5c in position 0: \ at end of string 

下面

edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore') 

修改我得到

edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore') 
TypeError: must be unicode, not str 

如何擺脫錯誤的Unicode的?看起來很麻煩,我非常感謝你的幫助。謝謝!!

+0

你能打印'edge_to'的值嗎? –

+0

Jean-Francois Fabre,謝謝。我可以。 – achimneyswallow

+0

我的意思是:你能給我們提供價值嗎? –

回答

0

我強烈建議您閱讀這個article on unicode。它給出了一個很好的解釋unicode與Python 2中的字符串。

對於您的問題,具體而言,當您撥打unicodedata.normalize("NFKD", edge_to)時,edge_to必須是unicode字符串。但是,它不是unicode,因爲您將其設置在以下行中:info[1][i] = str(info[1][i])。下面是一個簡單的測試:

import unicodedata 

edge_to = u'edge' # this is unicode 
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore') 
print edge_to # prints 'edge' as expected 

edge_to = 'edge' # this is not unicode 
edge_to = unicodedata.normalize("NFKD", edge_to).encode('utf-8', 'ignore') 
print edge_to # TypeError: must be unicode, not str 

您可以通過鑄造edge_to爲Unicode擺脫這個問題。

順便說一句,整個代碼塊的編碼/解碼看起來有點混亂。想想你想要的字符串是Unicode還是字節。你可能不需要做太多的編碼/解碼/標準化。