2016-03-31 17 views
1

有時候,我使用Python(和Pandas)越多,我越理解。所以我很抱歉,如果我只是沒有在這裏看到樹木的木材,但我一直在繞圈,只是看不到我做錯了什麼。基本上,我有一個示例腳本(我希望在更大的數據框上實現),但我無法讓它達到我的滿意度。使用Pandas grouopby agg函數將一個集合轉換爲一個列表導致'ValueError:函數不會減少'

數據幀由各種數據類型的列組成。我想將數據幀分組到2列,然後生成一個新的數據框,其中包含每個組中每個變量的所有唯一值的列表。 (最後,我想以連接列表項合併成一個字符串 - 但是這是一個不同的問題。)

我用最初的劇本是:

import numpy as np 
import pandas as pd 

def tempFuncAgg(tempVar): 
    tempList = set(tempVar.dropna()) # Drop NaNs and create set of unique values 
    print(tempList) 
    return tempList 

# Define dataframe 
tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 
         'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"], 
         'gender': ["male","female","female","male","male","female","female",np.nan,"male","male","female","male","female","female","male","female","male","female",np.nan,"male"], 
         'age': ["young","old","old","old","old","old",np.nan,"old","old","young","young","old","young","young","old",np.nan,"old","young",np.nan,np.nan]}) 

# Groupby based on 2 categorical variables 
tempGroupby = tempDF.groupby(['gender','age']) 

# Aggregate for each variable in each group using function defined above 
dfAgg = tempGroupby.agg(lambda x: tempFuncAgg(x)) 
print(dfAgg) 

如預期從這個腳本的輸出:一系列包含值的集合和一個包含了返回集的數據幀行:

{'09/04/2015 23:03', '21/04/2015 12:59', '06/04/2015 12:34'} 
{'01/06/2015 11:09', '12/05/2015 14:19', '27/05/2015 22:31', '19/06/2015 05:37'} 
{'15/04/2015 07:12', '19/05/2015 19:22', '06/05/2015 11:12', '04/06/2015 12:57', '15/06/2015 03:23', '12/04/2015 01:00'} 
{'02/04/2015 02:34', '10/05/2015 08:52'} 
{2, 3, 6} 
{18, 11, 13, 14} 
{4, 5, 9, 12, 15, 17} 
{1, 10} 
                  date \ 
gender age               
female old set([09/04/2015 23:03, 21/04/2015 12:59, 06/04... 
     young set([01/06/2015 11:09, 12/05/2015 14:19, 27/05... 
male old set([15/04/2015 07:12, 19/05/2015 19:22, 06/05... 
     young   set([02/04/2015 02:34, 10/05/2015 08:52]) 

             id 
gender age         
female old    set([2, 3, 6]) 
     young  set([18, 11, 13, 14]) 
male old set([4, 5, 9, 12, 15, 17]) 
     young    set([1, 10]) 

當我嘗試集轉換爲表出現問題。奇怪的是,它會產生2個包含相同列表的重複行,但是會失敗並出現'ValueError:Function does not reduce'錯誤。

def tempFuncAgg(tempVar): 
    tempList = list(set(tempVar.dropna())) # This is the only difference 
    print(tempList) 
    return tempList 


tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 
         'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"], 
         'gender': ["male","female","female","male","male","female","female",np.nan,"male","male","female","male","female","female","male","female","male","female",np.nan,"male"], 
         'age': ["young","old","old","old","old","old",np.nan,"old","old","young","young","old","young","young","old",np.nan,"old","young",np.nan,np.nan]}) 

tempGroupby = tempDF.groupby(['gender','age']) 

dfAgg = tempGroupby.agg(lambda x: tempFuncAgg(x)) 
print(dfAgg) 

但現在輸出的是:

['09/04/2015 23:03', '21/04/2015 12:59', '06/04/2015 12:34'] 
['09/04/2015 23:03', '21/04/2015 12:59', '06/04/2015 12:34'] 
--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
... 
ValueError: Function does not reduce 

任何有助於解決這個問題,將不勝感激,我提前道歉,如果這件事情很明顯,我只是沒有看到。

編輯 順便提一句,將集合轉換爲元組而不是列表可以正常工作。

回答

1

列表有時可能在熊貓有奇怪的問題。您可以:

  1. 使用元組(因爲你已經注意到)

  2. 如果你確實需要名單,只是做在這樣的第二操作:

    dfAgg.applymap(lambda x: list(x))

完整示例:

import numpy as np 
import pandas as pd 

def tempFuncAgg(tempVar): 
    tempList = set(tempVar.dropna()) # Drop NaNs and create set of unique values 
    print(tempList) 
    return tempList 

    # Define dataframe 
    tempDF = pd.DataFrame({ 'id': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], 
          'date': ["02/04/2015 02:34","06/04/2015 12:34","09/04/2015 23:03","12/04/2015 01:00","15/04/2015 07:12","21/04/2015 12:59","29/04/2015 17:33","04/05/2015 10:44","06/05/2015 11:12","10/05/2015 08:52","12/05/2015 14:19","19/05/2015 19:22","27/05/2015 22:31","01/06/2015 11:09","04/06/2015 12:57","10/06/2015 04:00","15/06/2015 03:23","19/06/2015 05:37","23/06/2015 13:41","27/06/2015 15:43"], 
          'gender': ["male","female","female","male","male","female","female",np.nan,"male","male","female","male","female","female","male","female","male","female",np.nan,"male"], 
          'age': ["young","old","old","old","old","old",np.nan,"old","old","young","young","old","young","young","old",np.nan,"old","young",np.nan,np.nan]}) 

# Groupby based on 2 categorical variables 
tempGroupby = tempDF.groupby(['gender','age']) 

# Aggregate for each variable in each group using function defined above 
dfAgg = tempGroupby.agg(lambda x: tempFuncAgg(x)) 

# Transform in list 
dfAgg.applymap(lambda x: list(x)) 

print(dfAgg) 

在熊貓中有很多這樣的bizzare行爲,通常比較好的做法是找到一個完美的解決方案

+0

非常感謝回覆並確認我沒有去非常生氣。我想這是通過經驗學到的那些細節之一。 – user1718097

相關問題