Python熊貓groupby與其他行的差別按列過濾

我正在用python熊貓與groupby掙扎。我應該如何完成以下任務？對於每一個水果，我想找出與那個水果的「第0步」值不同的地方。Python熊貓groupby與其他行的差別按列過濾

df = pd.DataFrame({'Fruit' : ['Apple', 'Apple', 'Apple', 'Banana', 'Banana', 'Banana'], 'Step' : [0, 1, 2, 0, 1, 2], 'Value' : [100, 102, 105, 200, 210, 195] }) 

    Fruit Step Value  to-be 
0 Apple  0 100 --> 0 
1 Apple  1 102 --> 2 
2 Apple  2 105 --> 5 
3 Banana  0 200 --> 0 
4 Banana  1 210 --> 10 
5 Banana  2 195 --> -5

謝謝！

來源

2016-05-13 IPV3001

這應做到：

df.groupby('Fruit').apply(lambda g: g.Value - g[g.Step == 0].Value.values[0])

首先，我們被你關心（水果）列分組。然後，我們對每個組應用一個函數（使用lambda，它可以讓我們指定一個函數內聯）。對於每個組，我們找到其中的行（g.Step == 0），然後從該行獲取Value條目，並使用values[0]獲取第一個值（如果存在多個地方，則爲g.Step == 0）。然後我們從組中的所有行中減去一個值，然後返回它。

如果你想將其添加爲數據幀一列，你可以刪除索引：

res = df.groupby('Fruit').apply(lambda g: g.Value - g[g.Step == 0].Value.values[0]) 
df['Result'] = res.reset_index(drop=True)

來源

2016-05-13 14:27:07 ASGM

此行'DF [ '結果'] = res.reset_index（ drop = True）'對我來說''ValueError：傳遞的項目數量錯誤1，索引暗示3''對我來說不適用' – SparkAndShine

@sparkandshine這很奇怪。你使用的是什麼版本的Python？這對我2.7.3運行良好。 – ASGM

我的Python版本是'2.7.6'。 – SparkAndShine

覺得這樣做的伎倆。它只是遍歷行並在每次步數等於0時應用新的「第一個」值。然後計算與第一個值的差值。

rows = range(df.shape[0]) 
df['count'] = 0 
for r in rows: 
    step = df.iloc[r,1] 
    value = df.iloc[r,2] 
    if step == 0: 
     first = value 
    df.iloc[r,3] = value - first

來源

2016-05-13 14:36:46 EllieFev

這絕對有效，但在較大的數據框中，它將比分組慢。一般來說，遍歷單行不能充分利用熊貓所提供的功能。（我提到這不是爲了攻擊你的答案，而是因爲當我開始學習熊貓時，有人給了我同樣的建議，這真的很有幫助）。 – ASGM

我完全同意，你的答案在這裏肯定比較強。然而，我有時會想（特別是因爲這似乎是初學者），一些簡單的事情可以在更簡單的情況下進行，並且更容易加入。當我開始時，我發現'lambda'很難讓我頭腦發熱！ – EllieFev

非常真實！看到做事有多種方式總是很有用（我敢肯定有一種比我投入的方法更好的方法）。 – ASGM

我是熊貓的新手，但至少下面的代碼工作。結果的末尾，

Fruit Step Value to-be 
0 Apple  0 100  0 
1 Apple  1 102  2 
2 Apple  2 105  5 
3 Banana  0 200  0 
4 Banana  1 210  10 
5 Banana  2 195  -5 

[6 rows x 4 columns]

源代碼如下。

import pandas as pd 

df = pd.DataFrame({'Fruit' : ['Apple', 'Apple', 'Apple', 'Banana', 'Banana', 'Banana'], 
        'Step' : [0, 1, 2, 0, 1, 2], 
        'Value' : [100, 102, 105, 200, 210, 195] }) 

list_groups = list() 

# loop over dataframe groupby `Fruit` 
for name, group in df.groupby('Fruit'): 
    group.sort('Step', ascending=True) # sorted by `Step` 

    row_iterator = group.iterrows() 

    # get the base value 
    idx, first_row = row_iterator.next() 
    base_value = first_row['Value'] 

    to_be = [0] # store the values of the column `to-be` 
    for idx, row in row_iterator: 
     to_be.append(row['Value'] - base_value) 

    # add a column to group 
    group['to-be'] = pd.Series(to_be, index=group.index) 

    list_groups.append(group) 


# Concatenate dataframes 
result = pd.concat(list_groups) 

print(result)

@ASGM，我跑你的代碼，

res = df.groupby('Fruit').apply(lambda g: g.Value - g[g.Step == 0].Value.values[0]) 
df['Result'] = res.reset_index(drop=True)

，但遇到問題，

Traceback (most recent call last): 
    File "***.py", line 9, in <module> 
    df['Result'] = res.reset_index(drop=True) 
    File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1887, in __setitem__ 
    self._set_item(key, value) 
    File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1968, in _set_item 
    NDFrame._set_item(self, key, value) 
    File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1068, in _set_item 
    self._data.set(key, value) 
    File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3024, in set 
    self.insert(len(self.items), item, value) 
    File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3039, in insert 
    self._add_new_block(item, value, loc=loc) 
    File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3162, in _add_new_block 
    self.items, fastpath=True) 
    File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 1993, in make_block 
    placement=placement) 
    File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 64, in __init__ 
    '%d' % (len(items), len(values))) 
ValueError: Wrong number of items passed 1, indices imply 3 
[Finished in 0.4s with exit code 1]

來源

2016-05-13 15:12:06 SparkAndShine

Python熊貓groupby與其他行的差別按列過濾

回答

相關問題