如何大熊貓

創建lazy_evaluated數據框列，很多時候，我有一個大的數據幀df保持基本數據，並需要創建更多的列來容納基本數據列計算衍生數據。如何大熊貓

我能做到這在像熊貓：

df['derivative_col1'] = df['basic_col1'] + df['basic_col2'] 
df['derivative_col2'] = df['basic_col1'] * df['basic_col2'] 
.... 
df['derivative_coln'] = func(list_of_basic_cols)

等熊貓將計算並一次全部分配的內存所有衍生物列。

我現在想要的是有一個懶惰的評價機制，以推遲派生列的計算和內存分配實際需要的時刻。有點定義lazy_eval_columns爲：

df['derivative_col1'] = pandas.lazy_eval(df['basic_col1'] + df['basic_col2']) 
df['derivative_col2'] = pandas.lazy_eval(df['basic_col1'] * df['basic_col2'])

，將節省像Python「產量」產生的時間/內存，因爲如果我發出df['derivative_col2']命令只會TRIGER具體的計算和內存分配。

因此，如何做好大熊貓lazy_eval()？任何提示/思想/裁判是受歡迎的。

來源

2013-10-26 bigbug

好問題。不過，不知道大熊貓是否有這樣的事情。這個想法讓我想起SQL視圖中的計算列。 –

0.13（很快釋放）開始，你可以做這樣的事情。這是使用生成器來評估動態公式。直列通過EVAL分配將在0.13的附加特徵，請參閱here

In [19]: df = DataFrame(randn(5, 2), columns=['a', 'b']) 

In [20]: df 
Out[20]: 
      a   b 
0 -1.949107 -0.763762 
1 -0.382173 -0.970349 
2 0.202116 0.094344 
3 -1.225579 -0.447545 
4 1.739508 -0.400829 

In [21]: formulas = [ ('c','a+b'), ('d', 'a*c')]

創建一個發生器，用於評估使用eval的公式;分配結果，然後得出結果。

In [22]: def lazy(x, formulas): 
    ....:  for col, f in formulas: 
    ....:   x[col] = x.eval(f) 
    ....:   yield x 
    ....:

在行動

In [23]: gen = lazy(df,formulas) 

In [24]: gen.next() 
Out[24]: 
      a   b   c 
0 -1.949107 -0.763762 -2.712869 
1 -0.382173 -0.970349 -1.352522 
2 0.202116 0.094344 0.296459 
3 -1.225579 -0.447545 -1.673123 
4 1.739508 -0.400829 1.338679 

In [25]: gen.next() 
Out[25]: 
      a   b   c   d 
0 -1.949107 -0.763762 -2.712869 5.287670 
1 -0.382173 -0.970349 -1.352522 0.516897 
2 0.202116 0.094344 0.296459 0.059919 
3 -1.225579 -0.447545 -1.673123 2.050545 
4 1.739508 -0.400829 1.338679 2.328644

所以其用戶確定訂購的評價（而不是按需）。理論上numba會支持這個，所以大熊貓可能支持這個作爲eval（目前使用numexpr進行即時評估）的後端。

我2C。

懶惰的評估很好，但可以很容易地通過使用python自己的continuation/generate特性來實現，所以儘可能將它構建到熊貓中是非常棘手的，並且需要一個非常好的用例才能實現。

來源

2013-10-26 20:39:47 Jeff

很高興在即將到來的更新版本中擁有「公式」和eval功能。我想知道更多關於如何使用df ['lazy_eval_col_x']語法來觸發按需計算。 – bigbug

你可以繼承DataFrame，並添加列作爲property。例如，

import pandas as pd 

class LazyFrame(pd.DataFrame): 
    @property 
    def derivative_col1(self): 
     self['derivative_col1'] = result = self['basic_col1'] + self['basic_col2'] 
     return result 

x = LazyFrame({'basic_col1':[1,2,3], 
       'basic_col2':[4,5,6]}) 
print(x) 
# basic_col1 basic_col2 
# 0   1   4 
# 1   2   5 
# 2   3   6

訪問屬性（經由x.derivative_col1，下面）調用LazyFrame定義的derivative_col1功能。此函數計算的結果，並增加派生列到LazyFrame實例：

print(x.derivative_col1) 
# 0 5 
# 1 7 
# 2 9 

print(x) 
# basic_col1 basic_col2 derivative_col1 
# 0   1   4    5 
# 1   2   5    7 
# 2   3   6    9

請注意，如果您修改基本柱：

x['basic_col1'] *= 10

派生列是不自動更新：

print(x['derivative_col1']) 
# 0 5 
# 1 7 
# 2 9

但是，如果您訪問該屬性，則重新計算這些值：

print(x.derivative_col1) 
# 0 14 
# 1 25 
# 2 36 

print(x) 
# basic_col1 basic_col2 derivative_col1 
# 0   10   4    14 
# 1   20   5    25 
# 2   30   6    36

來源

2014-02-05 11:35:15 unutbu

回答

相關問題