2012-12-18 108 views
6

我有一個歷史股票交易的數據框。該框架具有['ticker','date','cusip','profit','security_type']等列。最初:用鑰匙更新熊貓數據框

trades['cusip'] = np.nan 
trades['security_type'] = np.nan 

我有歷史配置文件,我可以加載到有比如[「股票」,「CUSIP」,「日期」,「姓名」,「SECURITY_TYPE」,「primary_exchange」]列幀。

我想用配置中的cusip和security_type更新交易框架,但僅限於股票代碼和日期匹配的地方。

我想我可以做這樣的事情:

pd.merge(trades, config, on=['ticker', 'date'], how='left') 

但是,這不更新列,它只是增加了配置列交易。

以下工作,但我認爲必須有更好的方法。如果不是的話,我可能會在熊貓之外做。

for date in trades['date'].unique(): 
    config = get_config_file_as_df(date) 
    ## config['date'] == date 
    for ticker in trades['ticker'][trades['date'] == date]: 
     trades['cusip'][ 
          (trades['ticker'] == ticker) 
         & (trades['date'] == date) 
         ] \ 
      = config['cusip'][config['ticker'] == ticker].values[0] 

     trades['security_type'][ 
          (trades['ticker'] == ticker) 
         & (trades['date'] == date) 
         ] \ 
      = config['security_type'][config['ticker'] == ticker].values[0] 

回答

13

假設你有這樣的設置:

import pandas as pd 
import numpy as np 
import datetime as DT 

nan = np.nan 

trades = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'], 
         'date' : pd.date_range('1/1/2000', periods = 4), 
         'cusip' : [nan, nan, 100, nan] 
         }) 
trades = trades.set_index(['ticker', 'date']) 
print(trades) 
#     cusip 
# ticker date    
# IBM 2000-01-01 NaN 
# MSFT 2000-01-02 NaN 
# GOOG 2000-01-03 100 # <-- We do not want to overwrite this 
# AAPL 2000-01-04 NaN 

config = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'], 
         'date' : pd.date_range('1/1/2000', periods = 4), 
         'cusip' : [1,2,3,nan]}) 
config = config.set_index(['ticker', 'date']) 

# Let's permute the index to show `DataFrame.update` correctly matches rows based on the index, not on the order of the rows. 
new_index = sorted(config.index) 
config = config.reindex(new_index)  
print(config) 
#     cusip 
# ticker date    
# AAPL 2000-01-04 NaN 
# GOOG 2000-01-03  3 
# IBM 2000-01-01  1 
# MSFT 2000-01-02  2 

然後你就可以在trades其值從config使用DataFrame.update方法更新NaN值。請注意,DataFrame.update與基於索引的行匹配(這就是爲什麼​​被稱爲上述原因)。

trades.update(config, join = 'left', overwrite = False) 
print(trades) 

#     cusip 
# ticker date    
# IBM 2000-01-01  1 
# MSFT 2000-01-02  2 
# GOOG 2000-01-03 100 # If overwrite = True, then 100 is overwritten by 3. 
# AAPL 2000-01-04 NaN