來源DF:
In [21]: data
Out[21]:
Product Category Product Cost Products
0 [Music, Journals, Paper] 55 Rock On Leather Journal
1 [Headphones, Music, Clocks] 163 Beats Earbuds In Ear Timer
2 [Watches, Clocks] 200 Garmin 25mm Wristwatch
首先讓變換(平整)到以下DF:
In [22]: lst_col = 'Product Category'
...:
...: x = pd.DataFrame({
...: col:np.repeat(data[col].values, data[lst_col].str.len())
...: for col in data.columns.difference([lst_col])
...: }).assign(**{lst_col:np.concatenate(data[lst_col].values)})[data.columns.tolist()]
...:
In [23]: x
Out[23]:
Product Category Product Cost Products
0 Music 55 Rock On Leather Journal
1 Journals 55 Rock On Leather Journal
2 Paper 55 Rock On Leather Journal
3 Headphones 163 Beats Earbuds In Ear Timer
4 Music 163 Beats Earbuds In Ear Timer
5 Clocks 163 Beats Earbuds In Ear Timer
6 Watches 200 Garmin 25mm Wristwatch
7 Clocks 200 Garmin 25mm Wristwatch
現在我們可以很容易地"count of how many Products within each Category, and to average the costs for each category"
:
In [25]: x.groupby('Product Category')['Product Cost'].agg(['size', 'mean']).reset_index()
Out[25]:
Product Category size mean
0 Clocks 2 181.5
1 Headphones 1 163.0
2 Journals 1 55.0
3 Music 2 109.0
4 Paper 1 55.0
5 Watches 1 200.0
幾點說明:
利用這些信息,我們可以複製所有非列表列如下
In [7]: data[lst_col].str.len()
Out[7]:
0 3
1 3
2 2
Name: Product Category, dtype: int64
:每排
數列表中的元素現在
In [3]: x = pd.DataFrame({
...: col:np.repeat(data[col].values, data[lst_col].str.len())
...: for col in data.columns.difference([lst_col])
...: })
In [4]: x
Out[4]:
Product Cost Products
0 55 Rock On Leather Journal
1 55 Rock On Leather Journal
2 55 Rock On Leather Journal
3 163 Beats Earbuds In Ear Timer
4 163 Beats Earbuds In Ear Timer
5 163 Beats Earbuds In Ear Timer
6 200 Garmin 25mm Wristwatch
7 200 Garmin 25mm Wristwatch
我們可以添加扁平list column
:
In [8]: np.concatenate(data[lst_col].values)
Out[8]:
array(['Music', 'Journals', 'Paper', 'Headphones', 'Music', 'Clocks', 'Watches', 'Clocks'],
dtype='<U10')
In [5]: x.assign(**{lst_col:np.concatenate(data[lst_col].values)})
Out[5]:
Product Cost Products Product Category
0 55 Rock On Leather Journal Music
1 55 Rock On Leather Journal Journals
2 55 Rock On Leather Journal Paper
3 163 Beats Earbuds In Ear Timer Headphones
4 163 Beats Earbuds In Ear Timer Music
5 163 Beats Earbuds In Ear Timer Clocks
6 200 Garmin 25mm Wristwatch Watches
7 200 Garmin 25mm Wristwatch Clocks
最後我們簡單地選擇原始順序中的列:
In [6]: x.assign(**{lst_col:np.concatenate(data[lst_col].values)})[data.columns.tolist()]
Out[6]:
Product Category Product Cost Products
0 Music 55 Rock On Leather Journal
1 Journals 55 Rock On Leather Journal
2 Paper 55 Rock On Leather Journal
3 Headphones 163 Beats Earbuds In Ear Timer
4 Music 163 Beats Earbuds In Ear Timer
5 Clocks 163 Beats Earbuds In Ear Timer
6 Watches 200 Garmin 25mm Wristwatch
7 Clocks 200 Garmin 25mm Wristwatch
這個工程!儘管如此,我仍然試圖通過.assign()完全理解「for」col部分發生的情況。它看起來像是每個類別發生的事情,您正在將行數據複製到一個新行中,以便每行都有一個類別。然後,使用.assign()添加所有其他列。但也許我錯了。這比我見過的任何事情都要複雜得多(儘管很棒),我希望你會爲看到這篇文章的其他人解釋一下。 – Adestin
@Adestin,我已經添加了一些解釋 - 請檢查 – MaxU