不同的指數組合dataframes我已經生成概率的數據幀從一個scikit學習分類是這樣的:與熊貓
def preprocess_category_series(series, key):
if series.dtype != 'category':
return series
if series.cat.ordered:
s = pd.Series(series.cat.codes, name=key)
mode = s.mode()[0]
s[s<0] = mode
return s
else:
return pd.get_dummies(series, drop_first=True, prefix=key)
data = df[df.year == 2012]
factors = pd.concat([preprocess_category_series(data[k], k) for k in factor_keys], axis=1)
predictions = pd.DataFrame([dict(zip(clf.classes_, l)) for l in clf.predict_proba(factors)])
我現在想這些概率追加回到我原來的數據幀。但是,上面生成的predictions
數據幀在保留data
中的項目順序的同時,已經丟失了data
的索引。我認爲我能夠做到
pd.concat([data, predictions], axis=1, ignore_index=True)
但是這會產生錯誤:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
我已經看到了這有時會出現,如果列名是重複的,但在這種情況下,沒有一個是。那是什麼錯誤?將這些數據框拼接在一起的最佳方式是什麼?
:
year serial hwtfinl region statefip \
cpsid
20121000000100 2012 1 3796.85 East South Central Division Alabama
20121000000100 2012 1 3796.85 East South Central Division Alabama
20121000000100 2012 1 3796.85 East South Central Division Alabama
20120800000500 2012 6 2814.24 East South Central Division Alabama
20120800000600 2012 7 2828.42 East South Central Division Alabama
county month pernum cpsidp wtsupp ... \
cpsid ...
20121000000100 0 11 1 20121000000101 3208.1213 ...
20121000000100 0 11 2 20121000000102 3796.8506 ...
20121000000100 0 11 3 20121000000103 3386.4305 ...
20120800000500 0 11 1 20120800000501 2814.2417 ...
20120800000600 1097 11 1 20120800000601 2828.4193 ...
race hispan educ votereg \
cpsid
20121000000100 White Not Hispanic 111 Voted
20121000000100 White Not Hispanic 111 Did not register
20121000000100 White Not Hispanic 111 Voted
20120800000500 White Not Hispanic 92 Voted
20120800000600 White Not Hispanic 73 Did not register
educ_parsed age4 educ4 \
cpsid
20121000000100 Bachelor's degree 65+ College grad
20121000000100 Bachelor's degree 65+ College grad
20121000000100 Bachelor's degree Under 30 College grad
20120800000500 Associate's degree, academic program 45-64 College grad
20120800000600 High school diploma or equivalent 65+ HS or less
race4 region4 gender
cpsid
20121000000100 White South Male
20121000000100 White South Female
20121000000100 White South Female
20120800000500 White South Female
20120800000600 White South Female
predictions.head()
:
a b c d e f
0 0.119534 0.336761 0.188023 0.136651 0.095342 0.123689
1 0.148409 0.346429 0.134852 0.169661 0.087556 0.113093
2 0.389586 0.195802 0.101738 0.085705 0.114612 0.112557
3 0.277783 0.262079 0.180037 0.102030 0.071171 0.106900
4 0.158404 0.396487 0.088064 0.079058 0.171540 0.106447
只是爲了好玩,我專門只用頭列試過這樣:
pd.concat([data_2012.iloc[0:5], predictions.iloc[0:5]], axis=1, ignore_index=True)
同樣的錯誤出現。
這對我來說非常合適。你的熊貓的版本是什麼? – Ali
我在版本0.18.0 – futuraprime
可以請打印predictions.head()和data.head()? – Shovalt