爲每個行（一到多）與熊貓

的多個結果。如果我有一個DataFrame其中每一行都是一個個體，每一列單獨的屬性，我怎麼能得到一個新的DataFrame每個單獨映射到多個結果如何？爲每個行（一到多）與熊貓

我試過用DataFrame.apply()這樣做，這似乎是最直觀的 - 但它給出了例外，如下例所示。添加broadcast=False或reduce=False沒有幫助。

^{下面是一個簡單的例子，很明顯，但考慮到每一行都映射到多行的任何場景。處理這個問題的最好方法是什麼？實際上，每一行都可以映射到不同數量的結果。這基本上是計算一對多的關係。}

例：我有一個DataFrame數據集具有以下結構，我想，每一個人，拿到3個即將到來的生日（簡單的例子，我知道）。所以，從：

+---+-------+------------+ 
| | name | birthdate | 
+---+-------+------------+ 
| 1 | John | 1990-01-01 | 
| 2 | Jane | 1957-04-03 | 
| 3 | Max | 1987-02-03 | 
| 4 | David | 1964-02-12 | 
+---+-------+------------+

喜歡的東西：

+-------+------------+ 
| name | birthday | 
+-------+------------+ 
| John | 2016-01-01 | 
| John | 2017-01-01 | 
| John | 2018-01-01 | 
| Jane | 2016-04-03 | 
| Jane | 2017-04-03 | 
| Jane | 2018-04-03 | 
| Max | 2016-02-03 | 
| Max | 2017-02-03 | 
| Max | 2018-02-03 | 
| David | 2016-02-12 | 
| David | 2017-02-12 | 
| David | 2018-02-12 | 
+-------+------------+

直覺上，我會嘗試這樣的事情：

def get_birthdays(person): 
    birthdays = [] 
    for year in range(2016, 2019): 
     birthdays.append({ 
      'name': person.name, 
      'birthday': person.birthdate.replace(year=year) 
     }) 

    return pd.DataFrame(birthdays) 

# with data as my original DataFrame 
data.apply(get_birthdays, axis=1)

然而，這引起了：

ValueError: could not broadcast input array from shape (3,2) into shape (3) 

During handling of the above exception, another exception occurred: 

[...] 

ValueError: cannot copy sequence with size 2 to array axis with dimension 3

來源

2015-09-09 vicvicvic

groupby版本o ˚Fapply支持DataFrame作爲方式返回值，你打算：

import pandas as pd 
from datetime import datetime 

df = pd.DataFrame({ 
    'name': ['John', 'Jane', 'Max', 'David'], 
    'birthdate': [datetime(1990,1,1), datetime(1957,4,3), datetime(1987,2,3), datetime(1964,2,12)], 
}) 

def get_birthdays(df_x): 
    d = {'name': [], 'birthday': []} 
    name = df_x.iloc[0]['name'] 
    original = df_x.iloc[0]['birthdate'] 
    for year in range(2016, 2019): 
     d['name'].append(name) 
     d['birthday'].append(original.replace(year=year)) 
    return pd.DataFrame(d) 

print df.groupby('name', group_keys=False).apply(get_birthdays).reset_index(drop=True)

輸出：

 birthday name 
0 2016-02-12 David 
1 2017-02-12 David 
2 2018-02-12 David 
3 2016-04-03 Jane 
4 2017-04-03 Jane 
5 2018-04-03 Jane 
6 2016-01-01 John 
7 2017-01-01 John 
8 2018-01-01 John 
9 2016-02-03 Max 
10 2017-02-03 Max 
11 2018-02-03 Max

來源

2015-09-09 15:41:20

大，謝謝！你是否有任何理由將列表中的字典列表傳遞給'pd.DataFrame'？ – vicvicvic

在這種情況下，這只是個人偏好。當列表很長的時候，列表字典更有效率;然而，這裏的列表很短，因此在內存開銷方面，您的字典列表方法可能會更好。 –

爲每個行（一到多）與熊貓

回答

相關問題