我需要組合包含字符串值的多個熊貓Series
。該系列是由多個驗證步驟產生的消息。我嘗試將這些消息合併到1 Series
以將其附加到DataFrame
。問題是結果是空的。在熊貓中組合系列
這是一個例子:
import pandas as pd
df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})
index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index
series = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series += df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)
print series
# >>> series
# 0 NaN
# 1 NaN
更新
import pandas as pd
df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})
index1 = df[df['a'] == 'b'].index
index2 = df[df['a'] == 'a'].index
series1 = df.iloc[index1].apply(lambda x: x['b'] + '-bbb', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-aaa', axis=1)
series3 = df.iloc[index2].apply(lambda x: x['a'] + '-ccc', axis=1)
# series3 causes a ValueError: cannot reindex from a duplicate axis
series = pd.concat([series1, series2, series3])
df['series'] = series
print df
UPDATE2
在這個例子中,指數似乎搞混。
import pandas as pd
df = pd.DataFrame({'a': ['a', 'b', 'c', 'd'], 'b': ['aa', 'bb', 'cc', 'dd']})
index1 = df[df['a'] == 'a'].index
index2 = df[df['a'] == 'b'].index
index3 = df[df['a'] == 'c'].index
series1 = df.iloc[index1].apply(lambda x: x['a'] + '-aaa', axis=1)
series2 = df.iloc[index2].apply(lambda x: x['a'] + '-bbb', axis=1)
series3 = df.iloc[index3].apply(lambda x: x['a'] + '-ccc', axis=1)
print series1
print
print series2
print
print series3
print
df['series'] = pd.concat([series1, series2, series3], ignore_index=True)
print df
print
df['series'] = pd.concat([series2, series1, series3], ignore_index=True)
print df
print
df['series'] = pd.concat([series3, series2, series1], ignore_index=True)
print df
print
這導致了這個輸出:
0 a-aaa
dtype: object
1 b-bbb
dtype: object
2 c-ccc
dtype: object
a b series
0 a aa a-aaa
1 b bb b-bbb
2 c cc c-ccc
3 d dd NaN
a b series
0 a aa b-bbb
1 b bb a-aaa
2 c cc c-ccc
3 d dd NaN
a b series
0 a aa c-ccc
1 b bb b-bbb
2 c cc a-aaa
3 d dd NaN
我希望只在0行一的,只有B的在ROW1和只有c在2行,但事實並非如此......
更新3
下面是一個更好的例子,它應該證明預期的行爲。正如我所說的,用例是對於給定的DataFrame
,函數計算每一行並可能返回某些行的錯誤消息,作爲Series
(包含一些索引,一些不是;如果沒有錯誤返回,錯誤系列是空的)。
In [12]:
s1 = pd.Series(['b', 'd'], index=[1, 3])
s2 = pd.Series(['a', 'b'], index=[0, 1])
s3 = pd.Series(['c', 'e'], index=[2, 4])
s4 = pd.Series([], index=[])
pd.concat([s1, s2, s3, s4]).sort_index()
# I'd like to get:
#
# 0 a
# 1 b b
# 2 c
# 3 d
# 4 e
Out[12]:
0 a
1 b
1 b
2 c
3 d
4 e
dtype: object
對不起,我得收回。獲取ValueError(請參閱更新示例)。 – orange 2014-09-22 12:24:33