從pandas.rolling_apply返回兩個值

我使用pandas.rolling_apply將數據擬合到一個分佈並從中獲取一個值，但我需要它也報告一個合適的滾動優度（特別是p值）。目前我正在做這樣的：從pandas.rolling_apply返回兩個值

def func(sample): 
    fit = genextreme.fit(sample) 
    return genextreme.isf(0.9, *fit) 

def p_value(sample): 
    fit = genextreme.fit(sample) 
    return kstest(sample, 'genextreme', fit)[1] 

values = pd.rolling_apply(data, 30, func) 
p_values = pd.rolling_apply(data, 30, p_value) 
results = pd.DataFrame({'values': values, 'p_value': p_values})

的問題是，我有很多的數據，並擬合函數是昂貴的，所以我不想兩次稱呼它爲每個樣品。我寧願做的是這樣的：

def func(sample): 
    fit = genextreme.fit(sample) 
    value = genextreme.isf(0.9, *fit) 
    p_value = kstest(sample, 'genextreme', fit)[1] 
    return {'value': value, 'p_value': p_value} 

results = pd.rolling_apply(data, 30, func)

如果結果是DataFrame有兩列。如果我嘗試運行此操作，則會發生異常： TypeError: a float is required。有沒有可能做到這一點，如果是的話，如何？

來源

2014-03-06 aquavitae

如果你返回一個系列而不是字典，它工作嗎？ –

@AndyHayden不，這給'TypeError：無法將系列轉換爲' – aquavitae

看到這個問題http://stackoverflow.com/questions/19121854/using-rolling-apply-on-a-dataframe-對象 – Jeff

我有一個類似的問題，並通過在應用過程中使用單獨的輔助類的成員函數來解決它。該成員函數根據需要返回單個值，但我將其他計算結果存儲爲類的成員，並可以在以後使用它。

簡單的例子：

class CountCalls: 
    def __init__(self): 
     self.counter = 0 

    def your_function(self, window): 
     retval = f(window) 
     self.counter = self.counter + 1 


TestCounter = CountCalls() 

pandas.Series.rolling(your_seriesOrDataframeColumn, window = your_window_size).apply(TestCounter.your_function) 

print TestCounter.counter

假設你的函數f將返回兩個值V1，V2的元組。然後，您可以返回v1並將其分配給column_v1到您的數據框。第二個值v2只是在輔助類中的series series_val2中累積。之後，您只需將該系列作爲您的數據框的新列。 JML

來源

2016-08-21 13:21:00 JML64

我也有同樣的問題。我通過生成一個全局數據框並從滾動函數中提供它來解決它。在下面的示例腳本中，我生成了一個隨機輸入數據。然後，我用一個滾動應用函數計算最小值，最大值和平均值。

import pandas as pd 
import numpy as np 

global outputDF 
global index 

def myFunction(array): 

    global index 
    global outputDF 

    # Some random operation 
    outputDF['min'][index] = np.nanmin(array) 
    outputDF['max'][index] = np.nanmax(array) 
    outputDF['mean'][index] = np.nanmean(array) 

    index += 1 
    # Returning a useless variable 
    return 0 

if __name__ == "__main__": 

    global outputDF 
    global index 

    # A random window size 
    windowSize = 10 

    # Preparing some random input data 
    inputDF = pd.DataFrame({ 'randomValue': [np.nan] * 500 }) 
    for i in range(len(inputDF)): 
     inputDF['randomValue'].values[i] = np.random.rand() 


    # Pre-Allocate memory 
    outputDF = pd.DataFrame({ 'min': [np.nan] * len(inputDF), 
           'max': [np.nan] * len(inputDF), 
           'mean': [np.nan] * len(inputDF) 
           }) 

    # Precise the staring index (due to the window size) 
    d = (windowSize - 1)/2 
    index = np.int(np.floor(d)) 

    # Do the rolling apply here 
    inputDF['randomValue'].rolling(window=windowSize,center=True).apply(myFunction,args=()) 

    assert index + np.int(np.ceil(d)) == len(inputDF), 'Length mismatch' 

    outputDF.set_index = inputDF.index 

    # Optional : Clean the nulls 
    outputDF.dropna(inplace=True) 

    print(outputDF)

來源

2017-06-22 13:42:22

我以前有過類似的問題。這裏是我的解決方案：

from collections import deque 
class your_multi_output_function_class: 
    def __init__(self): 
     self.deque_2 = deque() 
     self.deque_3 = deque() 

    def f1(self, window): 
     self.k = somefunction(y) 
     self.deque_2.append(self.k[1]) 
     self.deque_3.append(self.k[2]) 
     return self.k[0]  

    def f2(self, window): 
     return self.deque_2.popleft() 
    def f3(self, window): 
     return self.deque_3.popleft() 

func = your_multi_output_function_class() 

output = your_pandas_object.rolling(window=10).agg(
    {'a':func.f1,'b':func.f2,'c':func.f3} 
    )

來源

2017-10-29 04:34:02

從pandas.rolling_apply返回兩個值

回答

相關問題