關鍵（unicode列名）錯誤合併

u'가' u'나'  
0  
1 
... 


     A  B 
0 
1 
...

有兩個像上面這樣的數據框，分別叫做'left'和'right'。我試着像下面的代碼合併。關鍵（unicode列名）錯誤合併

result = pandas.merge(left, right, how='left', left_on=[u'가'], right_on=['A'])

但不幸的是，發生了錯誤。看來大熊貓合併left（right）_on = key功能無法識別unicode列名。

File "?.py", line ?, in merger 
    pandas.merge(left, right, how='left', left_on=[u'가'], right_on=['A']) 
    File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 37, in merge 
copy=copy) 
    File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 183, in __init__ 
self.join_names) = self._get_merge_keys() 
    File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 352, in _get_merge_keys 
left_keys.append(left[lk].values) 
    File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 1797, in __getitem__ 
return self._getitem_column(key) 
    File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 1804, in _getitem_column 
return self._get_item_cache(key) 
    File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 1084, in _get_item_cache 
values = self._data.get(item) 
    File "C:\Anaconda\lib\site-packages\pandas\core\internals.py", line 2851, in get 
loc = self.items.get_loc(item) 
    File "C:\Anaconda\lib\site-packages\pandas\core\index.py", line 1572, in get_loc 
return self._engine.get_loc(_values_from_object(key)) 
    File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3824) 
    File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3704) 
    File "pandas\hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12280) 
    File "pandas\hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12231) 
KeyError: u'\uac00'

以前有人遇到過這種錯誤嗎？如果是這樣，請讓我知道，並給我你的提示。

來源

2015-07-21 su79eu7k

只是出於好奇才做了以下工作：'result = pandas.merge（left，right，how ='left'，left_on = left.columns [0]，right_on = right.columns [0]）'？ – EdChum

你是對的。對不起每個人都感到困惑。在我看來，但不是unicode問題。這僅僅是因爲我在_groupby_之後嘗試了_merge_。 http://stackoverflow.com/a/24980809/3054161 – su79eu7k

請發表回答，並接受它，所以這個問題並不會得到答覆，謝謝 – EdChum

對不起每個人都困惑。在我看來，但不是unicode問題。這只是因爲我在groupby之後嘗試合併。如this。

默認情況下，groupby輸出將分組列作爲索引而不是列，這就是合併失敗的原因。

有幾種不同的方式來處理它，可能最簡單的方法是在定義groupby對象時使用as_index參數。

po_grouped_df = poagg_df.groupby(['EID','PCODE'], as_index=False)

然後，按照預期的合併應該工作。

不管怎樣，回到我的問題的例子，數據幀「左」列u「가」是一個索引沒有列，因爲我做了GROUPBY在「左」沒有as_index =假只是前合併。

來源

2015-07-21 13:12:18 su79eu7k

@EdChum Stackoverflow迫使我在2天后接受我自己的答案。所以我肯定會在那之後接受。 – su79eu7k

好吧，不知道，但這樣做很好，因爲它有助於過濾 – EdChum

我還沒有遇到過這個問題，但一個可能的解決辦法是：

left_no_unicode=left.copy() 
left_no_unicode.columns=[c if c!=u'가' else 'A' for c in left_no_unicode.columns] 
result = pandas.merge(left_no_unicode, right, how='left', on=['A'])

來源

2015-07-21 12:03:38

我猜你從文件構建數據幀，如.csv或.excel。然後，您需要設置編碼選項：

left=pd.read_csv('kor.csv', encoding='utf-8') 
#or 
left=pd.read_excel('kor.xlsx', encoding='utf-8')

它會解決問題。

來源

2015-07-21 12:16:23 Jihun

關鍵（unicode列名）錯誤合併

回答

相關問題