2017-01-30 46 views
1

我有一個包含氣象數據的數據框,每一行都是某個位置一天的數據。我想計算3天的平均值並將它們添加爲列。自然(至少對我來說)這樣做的方法是使用df.apply;但速度很慢且耗費大量內存(目前正在使用大約3Gb的內存,並且正在上升)。我的函數如下:(合併爲數據幀和它是由行號索引)從其他行的值中創建行

def three_day_stats(row): 
    total_snowfall = 0 
    total_sunshine = 0 
    mean_wind = 0 
    mean_temp = 0 
    days = range(max(0, row.name-3), row.name+1) 
    for i in days: 
     day = merged.loc[i] 
     total_snowfall += day['Snowfall'] 
     total_sunshine += day['Sunshine duration'] 
     mean_wind += (1/len(days))*(day['10 metre U wind component']**2 + day['10 metre V wind component']**2)**0.5 
     mean_temp += (1/len(days))*day['2 metre temperature'] 
    return pd.Series({'3 day snowfall': total_snowfall, 
         '3 day sunshine': total_sunshine, 
         '3 day wind': mean_wind, 
         '3 day temperature': mean_temp}) 

有沒有辦法做到這一點,而無需使用申請?或者至少讓它更有效?

編輯:一行數據

10 metre U wind component    2.13432 
10 metre V wind component    -0.932907 
2 metre temperature      3.88357 
Date       1996-11-01 00:00:00 
Latitude         46.3975 
Longitude         7.8515 
Snow density        269.103 
Snow depth       0.000514924 
Snowfall          0 
Sunshine duration      2.87365 
Temperature of snow layer    -0.677888 
winter         2015/16 
canton          VS 
community       Baltschieder 
elevation         3440 
aspect_string         E 
Avalanche          0 
Name: 0, dtype: object 
+0

@jezrael我在問題中添加了一個數據樣本。你提出的問題是,我每隔三天就會得到一次數據,而我每天都會喜歡這個數據。 – Nico

回答

1

您可以使用rollingaggregate之和的意思是,列3 day wind首次創建:

np.random.seed(100) 
start = pd.to_datetime('2015-02-24') 
rng = pd.date_range(start, periods=10) 
cols = ['Snowfall','Sunshine duration','10 metre U wind component','10 metre V wind component','2 metre temperature'] 
merged = pd.DataFrame(np.random.randint(10,size=(10,5)), columns=cols, index=rng).reset_index() 
print (merged) 
     index Snowfall Sunshine duration 10 metre U wind component \ 
0 2015-02-24   8     8       3 
1 2015-02-25   0     4       2 
2 2015-02-26   2     2       1 
3 2015-02-27   4     0       9 
4 2015-02-28   4     1       5 
5 2015-03-01   4     3       7 
6 2015-03-02   7     7       0 
7 2015-03-03   9     3       2 
8 2015-03-04   1     0       7 
9 2015-03-05   0     8       2 

    10 metre V wind component 2 metre temperature 
0       7     7 
1       5     2 
2       0     8 
3       6     2 
4       3     4 
5       1     1 
6       2     9 
7       5     8 
8       6     2 
9       5     1 
merged['3 day wind'] = (merged['10 metre U wind component']** 2 + 
         merged['10 metre V wind component']** 2)**0.5 
df = merged.rolling(4, min_periods=1).agg({'Snowfall': 'sum', 
          'Sunshine duration':'sum', 
          '2 metre temperature':'mean', 
          '3 day wind':'mean'}) 
d = {"Snowfall":"3 day snowfall", 
    "Sunshine duration":"3 day sunshine", 
    "2 metre temperature":"2 metre temperature"} 
df = df.rename(columns = d) 
print (df) 
    3 day wind 3 day sunshine 3 day snowfall 2 metre temperature 
0 7.615773    8.0    8.0    7.000000 
1 6.500469   12.0    8.0    4.500000 
2 4.666979   14.0   10.0    5.666667 
3 6.204398   14.0   14.0    4.750000 
4 5.758193    7.0   10.0    4.000000 
5 6.179668    6.0   14.0    3.750000 
6 6.429668   11.0   19.0    4.000000 
7 5.071796   14.0   24.0    5.500000 
8 5.918944   13.0   21.0    5.000000 
9 5.497469   18.0   17.0    5.000000 
+0

不錯,謝謝了! – Nico