電池我有2個dataframes,df1
和df2
,並要做到以下幾點,結果存儲在df3
:比較2個熊貓dataframes,逐行,通過細胞
for each row in df1:
for each row in df2:
create a new row in df3 (called "df1-1, df2-1" or whatever) to store results
for each cell(column) in df1:
for the cell in df2 whose column name is the same as for the cell in df1:
compare the cells (using some comparing function func(a,b)) and,
depending on the result of the comparison, write result into the
appropriate column of the "df1-1, df2-1" row of df3)
例如,像:
df1
A B C D
foo bar foobar 7
gee whiz herp 10
df2
A B C D
zoo car foobar 8
df3
df1-df2 A B C D
foo-zoo func(foo,zoo) func(bar,car) func(foobar,foobar) func(7,8)
gee-zoo func(gee,zoo) func(whiz,car) func(herp,foobar) func(10,8)
我已經開始與此:
for r1 in df1.iterrows():
for r2 in df2.iterrows():
for c1 in r1:
for c2 in r2:
,但我不知道該怎麼辦,並希望得到一些幫助。
因爲你應用FUNC同名的列,你可以遍歷僅通過列和使用矢量化,例如df3 ['A'] = func(df1 ['A'],df2 ['A']),等等? – StarFox
@StarFox有趣,所以我可能會做類似於:df3中的列:df3 [column] = func(df1 [column],df2 [column])? – Zubo
當然!這就是熊貓/ numpy的力量(一般來說,矢量化)。我將在下面提供一些示例,並且我們將從那裏開始 – StarFox