2017-08-13 21 views
3

我想對數據框的三列進行計算df。爲了做到這一點,我想在三列表中運行資產(cryptocurrencies)列表的價格,以便在獲得足夠數據後計算它們的指數移動平均值。用Python編碼指數移動平均值

def calculateAllEMA(self,values_array): 
    df = pd.DataFrame(values_array, columns=['BTC', 'ETH', 'DASH']) 
    column_by_search = ["BTC", "ETH", "DASH"] 
    print(df) 
    for i,column in enumerate(column_by_search): 
     ema=[] 
     # over and over for each day that follows day 23 to get the full range of EMA 
     for j in range(0, len(column)-24): 
      # Add the closing prices for the first 22 days together and divide them by 22. 
      EMA_yesterday = column.iloc[1+j:22+j].mean() 
      k = float(2)/(22+1) 
      # getting the first EMA day by taking the following day’s (day 23) closing price multiplied by k, then multiply the previous day’s moving average by (1-k) and add the two. 
      ema.append(column.iloc[23 + j]*k+EMA_yesterday*(1-k)) 
     print("ema") 
     print(ema) 
     mean_exp[i] = ema[-1] 
    return mean_exp 

然而,當我打印什麼在len(column)-24我得到-21(-24 + 3?)。因此我不能通過循環。我該如何處理這個錯誤以獲得資產的指數移動平均值?

我試圖將this link from iexplain.com用於指數移動平均值的僞代碼。

如果您有任何更簡單的想法,我很樂意聽到它。

這裏是我用來計算它時,它的錯誤數據:

 BTC  ETH DASH 
0 4044.59 294.40 196.97 
1 4045.25 294.31 196.97 
2 4044.59 294.40 196.97 
3 4045.25 294.31 196.97 
4 4044.59 294.40 196.97 
5 4045.25 294.31 196.97 
6 4044.59 294.40 196.97 
7 4045.25 294.31 196.97 
8 4045.25 294.31 196.97 
9 4044.59 294.40 196.97 
10 4045.25 294.31 196.97 
11 4044.59 294.40 196.97 
12 4045.25 294.31 196.97 
13 4045.25 294.32 197.07 
14 4045.25 294.31 196.97 
15 4045.41 294.46 197.07 
16 4045.25 294.41 197.07 
17 4045.41 294.41 197.07 
18 4045.41 294.47 197.07 
19 4045.25 294.41 197.07 
20 4045.25 294.32 197.07 
21 4045.43 294.35 197.07 
22 4045.41 294.46 197.07 
23 4045.25 294.41 197.07 
+0

如果你不這樣做會這純粹是一個學習的過程,你應該知道,熊貓已經內置了指數加權移動平均計算: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.ewm.html –

+0

大熊貓 文檔中有更多。 https://pandas.pydata.org/pandas-docs/stable/computation.html –

回答

0

在你的循環for i,column in enumerate(column_by_search):你迭代在column_by_search列表中的元素,也就是列取值爲「BTC」 「ETH」,「DASH」。因此,len(column)會給你字符串「BTC」的長度,實際上是3。

嘗試df[column]取而代之的是,它會返回一個包含所需列中元素的列表,並且可以迭代它。

+0

謝謝您的建議!然而,它然後告訴我'UnboundLocalError:局部變量'列'在賦值之前被引用 –

3

您可以使用pandas.stats.moments.ewma解釋爲here

這是我的一個可能的解決方案建議:

該數據幀與隨機值應該適合你的描述:

# imports 
import pandas as pd 
import numpy as np 
rows = 50 
df = pd.DataFrame(np.random.randint(90,110,size=(rows, 3)), columns=['BTC', 'ETH', 'DASH']) 
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist() 
df['dates'] = datelist 
df = df.set_index(['dates']) 
df.index = pd.to_datetime(df.index) 

print(df.tail(10)) 

enter image description here

下面的代碼片段讓你的原始數據幀的副本,並將給予定義的窗口長度的列以ewmas:

# Define the window length 
win = 22 

# Make a copy of the original dataframe and add ewma for all columns 
df_temp = df.copy() 

# Manage existing column names 
colNames = list(df_temp.columns.values).copy() 
removeNames = colNames.copy() 

i = 0 
for col in colNames: 

    # Make new names for ewmas 
    ewmaName = colNames[i] + '_ewma_' + str(win) 

    # Add ewmas 
    df_temp[ewmaName] = pd.stats.moments.ewma(df[colNames[i]], span = win) 
    i = i + 1 

# Here you have all original columns AND corresponding ewmas 
print(df_temp.tail()) 

enter image description here

雖然我們必須做一些調整。 pd.stats.moments.ewma不會 如果沒有足夠的觀察值填充您的估計窗口,則會產生錯誤。因此,您應該刪除與窗口長度相對應的第一個觀察值。

df_temp_subset = df_temp.ix[win:] 

print(df_temp_subset) 

enter image description here

取決於您是否想保持你的源數據在新的數據幀,你可以像這樣刪除它:

df_temp = df_temp.drop(removeNames,1) 

enter image description here


下面是包含在函數中的整個過程:

def ewmas(df, win, keepSource): 
    """Add exponentially weighted moving averages for all columns in a dataframe. 

    Arguments: 
    df -- pandas dataframe 
    win -- length of ewma estimation window 
    keepSource -- True or False for keep or drop source data in output dataframe 

    """ 

    df_temp = df.copy() 
    # Manage existing column names 
    colNames = list(df_temp.columns.values).copy() 
    removeNames = colNames.copy() 

    i = 0 
    for col in colNames: 

     # Make new names for ewmas 
     ewmaName = colNames[i] + '_ewma_' + str(win) 

     # Add ewmas 
     df_temp[ewmaName] = pd.stats.moments.ewma(df[colNames[i]], span = win) 
     i = i + 1 

    # Remove estimates with insufficient window length 
    df_temp = df_temp.ix[win:] 

    # Remove or keep source data 
    if keepSource == False: 
     df_temp = df_temp.drop(removeNames,1) 

    return df_temp 

而這裏的測試運行:

# Test run 
df_new = ewmas(df = df, win = 22, keepSource = True) 
print(df_new.tail()) 

enter image description here