熊貓可擴展的Python正態分佈DataFrame

我有一個熊貓數據框（下面的代碼），具有星期和星期幾的平均值和標準偏差。我想要做的是提取每週的平均和標準偏差，從這兩個值創建一個隨機正常樣本，然後繪製它。熊貓可擴展的Python正態分佈DataFrame

np.random.seed(42) 
day_of_week=['mon', 'tues', 'wed', 'thur', 'fri', 'sat','sun'] 
year=[2017] 
qtr=[1,2,3,4] 
mean=np.random.uniform(5,30,len(day_of_week)*len(qtr)) 
std=np.random.uniform(1,10,len(day_of_week)*len(qtr)) 

dat=pd.DataFrame({'year':year*(len(day_of_week)*len(qtr)), 
      'qtr':qtr*len(day_of_week), 
      'day_of_week':day_of_week*len(qtr), 
      'mean':mean, 
      'std': std}) 
dowuq=dat.day_of_week.unique()

現在我有一個解決上述工作，但這種方法不是很大的可擴展性。如果我想添加越來越多的欄目（即另一年）或按周排列，這不會有效。我相當新的Python，所以任何幫助表示讚賞。

的作品，但沒有可擴展的

代碼：

plt.style.use('fivethirtyeight') 
for w in dowuq: 
    datsand=dat[dat['day_of_week']==''+str(w)+''][0:4] 
    mu=datsand.iloc[0]['mean'] 
    sigma=datsand.iloc[0]['std'] 
    mu2=datsand.iloc[1]['mean'] 
    sigma2=datsand.iloc[1]['std'] 
    mu3=datsand.iloc[2]['mean'] 
    sigma3=datsand.iloc[2]['std'] 
    mu4=datsand.iloc[3]['mean'] 
    sigma4=datsand.iloc[3]['std']    
    s1=np.random.normal(mu, sigma, 2000) 
    s2=np.random.normal(mu2, sigma2, 2000) 
    s3=np.random.normal(mu3, sigma3, 2000) 
    s4=np.random.normal(mu4, sigma4, 2000) 
    sns.kdeplot(s1, bw='scott', label='Q1') 
    sns.kdeplot(s2, bw='scott', label='Q2') 
    sns.kdeplot(s3, bw='scott', label='Q3') 
    sns.kdeplot(s4, bw='scott', label='Q4') 
    plt.title(''+str(w)+' in 2017') 
    plt.ylabel('Density') 
    plt.xlabel('Random') 
    plt.xticks(rotation=15) 
    plt.show()

來源

2017-07-21 P.Cummings

你或許應該使用groupby，它允許你一組數據幀。目前，我們只在'day'上進行分組，但如果需要，您可以在未來擴展此功能。

我們也可以換到了所有列出的行使用iterrows循環：

import numpy as np 
import pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt 

np.random.seed(42) 
day_of_week = ['mon', 'tues', 'wed', 'thur', 'fri', 'sat', 'sun'] 
year = [2017] 
qtr = [1, 2, 3, 4] 
mean = np.random.uniform(5, 30, len(day_of_week) * len(qtr)) 
std = np.random.uniform(1, 10, len(day_of_week) * len(qtr)) 

dat = pd.DataFrame({'year': year * (len(day_of_week) * len(qtr)), 
        'qtr': qtr * len(day_of_week), 
        'day_of_week': day_of_week * len(qtr), 
        'mean': mean, 
        'std': std}) 

# Group by day of the week 
for day, values in dat.groupby('day_of_week'): 
    # Loop over rows for each day of the week 
    for i, r in values.iterrows(): 
     cur_dist = np.random.normal(r['mean'], r['std'], 2000) 
     sns.kdeplot(cur_dist, bw='scott', label='{}_Q{}'.format(day, r['qtr'])) 
    plt.title('{} in 2017'.format(day)) 
    plt.ylabel('Density') 
    plt.xlabel('Random') 
    plt.xticks(rotation=15) 
    plt.show() 
    plt.clf()

來源

2017-07-21 13:55:52 asongtoruin

感謝。爲了我自己的澄清，數據已經在星期幾水平了，爲什麼你必須按星期幾分組呢？ –

星期幾的分組有效地結合了您所稱的'dowuq'和'datsand'。對於「day_of_week」列中的每個唯一值，「groupby」提供一個只包含與該值匹配的行的數據幀。你可以嘗試在第一個'for'循環中打印'values'來更清楚地看到它。 – asongtoruin

@ P.Cummings對你的問題有幫助嗎？ – asongtoruin

熊貓可擴展的Python正態分佈DataFrame

回答

相關問題