如何在多索引系列中找不到NaN？

我有兩個DataFrames DF1和DF2具有很多列如何在多索引系列中找不到NaN？

DF1 - [2756003行×44列]

DF2 - [22035行×11列]

我需要添加新列到DF2

t1 = df1.groupby(['category', 'manufacturer']) 
t2=t1[c1].mean() 
str1='_'.join(col) 
df2[c1+'_'+str1+'_mean']=t2[df2[['category','manufacturer']].as_matrix()].values

從由結果基於組DF1 目標列的平均（在DF1 DF2和對於同一列），其返回：

IndexError: arrays used as indices must be of integer (or boolean) type

T2 - 商店多指標系列，如：

category manufacturer 
1   2    0.000000 
      4    8.796840 
      10    2.312407 
      19    1.135094 
      24    4.355000

如果我使用現有的索引我會得到預期的結果

In [302]: t2[1, 2] 
Out[302]: 0.0

但是，如果我叫T2 [410，332]，其中332它是以df2呈現並且不以df1呈現的製造商的id，我將得到

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

我想要得到NaN反而像我們從

df2['manufacturer'].map(t2)

萬一只有一列。

來源

2017-08-31 Roman

使用pd.merge合併df2和t2：

df2 = pd.merge(df2, t2.reset_index(), on=['category','manufacturer'], how='left')

因爲默認情況下，pd.merge連接上的所有共享列，如果'category'和'manufacturer'是唯一列df2和通用的，那麼該行t2.reset_index()份額以上可簡化爲

df2 = pd.merge(df2, t2.reset_index(), how='left')

個

import numpy as np 
import pandas as pd 
np.random.seed(2017) 

df1 = pd.DataFrame(np.random.randint(4, size=(100,3)), columns=['category', 'manufacturer', 'col']) 

df2 = pd.DataFrame(np.random.randint(1, 5, size=(100,3)), columns=['category', 'manufacturer', 'col2']) 

t1 = df1.groupby(['category', 'manufacturer']) 
c1 = 'col' 
t2 = t1[c1].mean() 
col = ['foo', 'bar'] 
str1='_'.join(col) 
t2.name = c1+'_'+str1+'_mean' 
df2 = pd.merge(df2, t2.reset_index(), on=['category','manufacturer'], how='left') 
print(df2.head())

打印

category manufacturer col2 col_foo_bar_mean 
0   1    1  2   1.333333 
1   3    4  3    NaN 
2   4    4  2    NaN 
3   3    3  1   1.000000 
4   3    2  1   1.777778

因爲這是一個「左連接」，在df2行對此有沒有相應的行t2與缺失值的列分配NaN。

來源

2017-08-31 18:03:52 unutbu

有 'AttributeError的： 'CategoricalIndex' 對象具有 '後 'DF2 = pd.merge（DF2，t2.to_frame（），left_on = [is_dtype_equal'' ' 類別 '' 製造商沒有屬性']， right_index =真，如何=' 左 '）' 所以我修改此部分 'DF2 = pd.merge（DF2，t2.reset_index（），left_on = [' 類別」，'製造商]，right_on = ['category'，'manufacturer']，how ='left'）' 它的工作！謝謝 – Roman

太好了，謝謝你的糾正。由於'left_on'和'right_on'指定了相同的列名，所以你可以簡單地將它設置爲'on = ['category'，'manufacturer']'。如果這些是2個DataFrame共享的唯一列，那麼您甚至可以完全省略它。 – unutbu

如何在多索引系列中找不到NaN？

回答

相關問題