我不知道發生了什麼,標題只是一階近似。我試圖把兩個數據幀:熊貓加入:無法識別加入列
>>> df_sum.head()
TUCASEID t070101 t070102 t070103 t070104 t070105 t070199 \
0 20030100013280 0 0 0 0 0 0
1 20030100013344 0 0 0 0 0 0
2 20030100013352 60 0 0 0 0 0
3 20030100013848 0 0 0 0 0 0
4 20030100014165 0 0 0 0 0 0
t070201 t070299 shopping year
0 0 0 0 2003
1 0 0 0 2003
2 0 0 60 2003
3 0 0 0 2003
4 0 0 0 2003
>>> emp.head()
TUCASEID status
0 20030100013280 emp
1 20030100013344 emp
2 20030100013352 emp
4 20030100014165 emp
5 20030100014169 emp
這是該數據幀,我想加入他們在公共列TUCASEID
,其中有交叉:
>>> np.intersect1d(emp.TUCASEID, df_sum.TUCASEID)
array([20030100013280, 20030100013344, 20030100013352, ..., 20131212132462,
20131212132469, 20131212132475])
現在...
>>> df_sum.join(emp, on='TUCASEID', how='inner')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 3829, in join
rsuffix=rsuffix, sort=sort)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 3843, in _join_compat
suffixes=(lsuffix, rsuffix), sort=sort)
File "/usr/local/lib/python2.7/site-packages/pandas/tools/merge.py", line 39, in merge
return op.get_result()
File "/usr/local/lib/python2.7/site-packages/pandas/tools/merge.py", line 193, in get_result
rdata.items, rsuf)
File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3873, in items_overlap_with_suffix
to_rename)
ValueError: columns overlap but no suffix specified: Index([u'TUCASEID'], dtype='object')
嗯,這很奇怪,出現在這兩個數據幀的唯一列是一個參加過,但是那好,我們同意[1]:
>>> df_sum.join(emp, on='TUCASEID', how='inner', rsuffix='r')
Empty DataFrame
Columns: [TUCASEID, t070101, t070102, t070103, t070104, t070105, t070199, t070201, t070299, shopping, year, TUCASEIDr, status]
Index: []
儘管存在巨大的交叉點。這裏發生了什麼?
>>> pd.__version__
'0.15.0'
[1]:我實際上執行整數爲D型接合柱的,因爲它表示「對象」在那裏,並沒有區別:
>>> emp.dtypes
TUCASEID int64
status object
dtype: object
>>> df_sum.dtypes
TUCASEID int64
(...)
shopping int64
year int64
dtype: object
您的索引值不匹配,爲什麼不乾脆 此外,所謂的這種方式,當合併爲空合併它們'df_sum.merge(emp,on ='TUCASEID',how ='outer')'或者你只是想爲每個'TUCASEID'行添加'status'列感興趣?在這種情況下做'df_sum ['status'] = df ['sum ['TUCASEID']。map(emp.set_index('TUCASEID')' – EdChum 2015-01-31 22:24:13
@EdChum好吧,我想看看替代方案。索引值不匹配?我已經指定了替代'on ='列。 – FooBar 2015-01-31 22:25:39
不知道'join'加在索引上,奇怪的是我可以重新創建的行爲,但是我建議應該使用的其他方法 – EdChum 2015-01-31 22:27:04