2017-09-10 73 views
1

我想爲groupBy對象計數多個值(包含在每個單元格的列表中)。在groupby對象中計數多個值

我有以下數據框:

| | Record the respondent’s sex | 7. What do you use the phone for? | |---|-----------------------------|---------------------------------------------| | 0 | Male | sending texts;calls;receiving sending texts | | 1 | Female | sending texts;calls;WhatsApp;Facebook | | 2 | Male | sending texts;calls;receiving texts | | 3 | Female | sending texts;calls |

我想計數7. What do you use the phone for?列中的每個值,在Record the respondent’s sex分組後。

當每個單元只有一個值時,我沒有問題。

grouped = df.groupby(['Record the respondent’s sex'], sort=True) 

question_counts = grouped['2. Are you a teacher, caregiver, or young adult ?'].value_counts(normalize=False, sort=True) 

question_data = [ 
     {'2. Are you a teacher, caregiver, or young adult ?': question, 'Record the respondent’s sex': group, 'count': count*100} for 
     (group, question), count in dict(question_counts).items()] 

df_question = pd.DataFrame(question_data) 

給了我一個表,看起來完全是這樣的:

| 7. What do you use the phone for? | Record the respondent's sex | count | |-----------------------------------|-----------------------------|-------| | sending texts | Male | 2 | | calls | Male | 2 | | receiving texts | Male | 2 | | sending texts | Female | 2 | | calls | Female | 2 | | WhatsApp | Female | 1 | | Facebook | Female | 1 |

如果我能得到這個具有多個值的工作!

value_counts()不適用於具有多個值的列表,它會引發TypeError: unhashable type: 'list'錯誤。 Counting occurrence of values in a Panda series?這個問題展示瞭如何以各種方式處理這個問題,但我似乎無法讓它在GroupBy對象上工作。

+0

複製/爆炸的多重價值爲行確實好像去這是最簡單和最快的方式(見https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-入口到分開的行),儘管下面接受的答案表明它也可以在沒有這樣做的情況下完成。 –

回答

1
# Initialize sample data. 
df = pd.DataFrame({'Record the respondent’s sex': ['Male', 'Female'] * 2, 
        '7. What do you use the phone for?': [ 
         "sending texts;calls;receiving sending texts", 
         "sending texts;calls;WhatsApp;Facebook", 
         "sending texts;calls;receiving texts", 
         "sending texts;calls" 
        ]}) 

# Split the values on ';' and separate into columns. Melt the result. 
df2 = pd.melt(
    pd.concat([df['Record the respondent’s sex'], 
       df.loc[:, "7. What do you use the phone for?"].apply(
        lambda series: series.split(';')).apply(pd.Series)], axis=1), 
    id_vars='Record the respondent’s sex')[['Record the respondent’s sex', 'value']] 

# Group on gender and rename columns. 
result = df2.groupby('Record the respondent’s sex')['value'].value_counts().reset_index() 
result.columns = ['Record the respondent’s sex', '7. What do you use the phone for?', 'count'] 

# Reorder columns. 
>>> result[['7. What do you use the phone for?', 'Record the respondent’s sex', 'count']] 
    7. What do you use the phone for? Record the respondent’s sex count 
0        calls      Female  2 
1      sending texts      Female  2 
2       Facebook      Female  1 
3       WhatsApp      Female  1 
4        calls      Male  2 
5      sending texts      Male  2 
6   receiving sending texts      Male  1 
7     receiving texts      Male  1 
+0

我不知道pd.melt(),它看起來很棒。謝謝! –

+0

事後看來,爲每個多值創建一個額外的行的路線(這是MaxU如何將其視爲重複的)更容易,也更快。 –