2015-10-03 201 views
0

如果您幫我解決這個問題,請提前致謝。我試圖完成的是使用另一個數據幀(indexed_orders)在同一日期更新一個填充了零的數據框和一個日期時間索引(我的交易數據框)。我的代碼如下:使用另一個數據幀中的行更新數據幀

import pandas as pd 
import numpy as np 
import os 
import csv 


orders = pd.read_csv('./orders/orders.csv', parse_dates=True, sep=',', dayfirst=True) #initiate orders data frame from csv data file 
indexed_orders = orders.set_index(['Date']) #set Date as index for orders 
print indexed_orders 

symbol_list = orders['Symbol'].tolist() #creates list of symbols 
symbols = list(set(symbol_list)) #gets rid of duplicates in list 


dates_list = orders['Date'].tolist() #creates list of order dates 
dates_orders = list(set(dates_list)) #gets rid of duplicates in list 


start_date = '2011-01-05' #establish date range 
end_date = '2011-01-20' 

dates = pd.date_range(start_date, end_date) #establish dates from start_date and end_date 

trade = pd.DataFrame(0, index = dates, columns = symbols) #establish trade data frame 
trade['Cash'] = 0 #add column for future calculations 
print trade 

這對於indexed_orders輸出:

Date   Symbol Order Shares 
2011-01-10 AAPL BUY 1500 
2011-01-13 AAPL SELL 1500 
2011-01-13 IBM BUY 4000 
2011-01-26 GOOG BUY 1000 
2011-02-02 XOM SELL 4000 
2011-02-10 XOM BUY 4000 
2011-03-03 GOOG SELL 1000 
2011-03-03 IBM SELL 2200 
2011-06-03 IBM SELL 3300 
2011-05-03 IBM BUY 1500 
2011-06-10 AAPL BUY 1200 
2011-08-01 GOOG BUY  55 
2011-08-01 GOOG SELL  55 
2011-12-20 AAPL SELL 1200 

和產出的行業如下:

  GOOG AAPL XOM IBM Cash 
2011-01-05  0  0 0 0  0 
2011-01-06  0  0 0 0  0 
2011-01-07  0  0 0 0  0 
2011-01-08  0  0 0 0  0 
2011-01-09  0  0 0 0  0 
2011-01-10  0  0 0 0  0 
2011-01-11  0  0 0 0  0 
2011-01-12  0  0 0 0  0 
2011-01-13  0  0 0 0  0 
2011-01-14  0  0 0 0  0 
2011-01-15  0  0 0 0  0 
2011-01-16  0  0 0 0  0 
2011-01-17  0  0 0 0  0 
2011-01-18  0  0 0 0  0 
2011-01-19  0  0 0 0  0 
2011-01-20  0  0 0 0  0 

我要更新目前日期我的交易數據幀在我的idexed_orders中,在正確的'Symbol'(這是交易中的AAPL,IBM,GOOG和XOM名稱)下的列中插入'Shares'數量。當indexed_orders中的'Order'列指定'SELL'時,我也希望'Shares'的值爲負值。換句話說,我想拿出代碼更新的貿易數據幀這樣的: 打印貿易

  GOOG AAPL XOM IBM Cash 
2011-01-05  0  0 0 0  0 
2011-01-06  0  0 0 0  0 
2011-01-07  0  0 0 0  0 
2011-01-08  0  0 0 0  0 
2011-01-09  0  0 0 0  0 
2011-01-10  0 1500 0 0  0 
2011-01-11  0  0 0 0  0 
2011-01-12  0  0 0 0  0 
2011-01-13  0 -1500 0 4000  0 
2011-01-14  0  0 0 0  0 
2011-01-15  0  0 0 0  0 
2011-01-16  0  0 0 0  0 
2011-01-17  0  0 0 0  0 
2011-01-18  0  0 0 0  0 
2011-01-19  0  0 0 0  0 
2011-01-20  0  0 0 0  0 

我想到某種與嵌套布爾語句迭代是必要的,但我肯定有很難找出答案。特別是,我很難想出一個方法來整合行並根據索引日期時間進行更新。

任何幫助將非常感激。

回答

1

首先,您可以使用Order列來簽署股份變更。然後,您可以按DateSymbol進行分組,並按求和命令進行合計。這會給你一個Series的所有獨特日子的訂單和Symbols那些日子交易。最後,使用unstackSeries轉換爲表格格式。

import numpy as np 
import pandas as pd 

df = pd.io.parsers.read_csv('temp.txt', sep = '\t') 

print df 

''' 
     Date Symbol Order Shares 
0 1/10/11 AAPL BUY 1500 
1 1/13/11 AAPL SELL 1500 
2 1/13/11 IBM BUY 4000 
3 1/26/11 GOOG BUY 1000 
4  2/2/11 XOM SELL 4000 
5 2/10/11 XOM BUY 4000 
6  3/3/11 GOOG SELL 1000 
7  3/3/11 IBM SELL 2200 
8  6/3/11 IBM SELL 3300 
9  5/3/11 IBM BUY 1500 
10 6/10/11 AAPL BUY 1200 
11 8/1/11 GOOG BUY  55 
12 8/1/11 GOOG SELL  55 
13 12/20/11 AAPL SELL 1200 
''' 

df['SharesChange'] = df.Shares * df.Order.apply(lambda o: 1 if o == 'BUY' else -1) 

df = df.groupby(['Date', 'Symbol']).agg({'SharesChange' : np.sum}).unstack().fillna(0) 

print df 
''' 
     SharesChange 
Symbol   AAPL GOOG IBM XOM 
Date 
1/10/11   1500  0  0  0 
1/13/11   -1500  0 4000  0 
1/26/11    0 1000  0  0 
12/20/11  -1200  0  0  0 
2/10/11    0  0  0 4000 
2/2/11    0  0  0 -4000 
3/3/11    0 -1000 -2200  0 
5/3/11    0  0 1500  0 
6/10/11   1200  0  0  0 
6/3/11    0  0 -3300  0 
8/1/11    0  0  0  0 
''' 
+0

謝謝。這看起來是一種很好的方式,可以將我的indexed_orders轉換爲與我交易數據框中相應的列對齊。但是,我仍然堅持如何更新每行這些交易數據框。有任何想法嗎? – adisciu

+0

@adisciu,我在Yakym Priozhenko的df和你的交易之間唯一的區別就是行的存在與沒有交易活動的日期相對應。如果您希望返回這些空行,則可以將數據幀重新索引到期望的日期範圍。 – jgloves

+0

@jgloves我想要交易只包含指定交易日期時間範圍內的日期。這將改變我的程序,但indexed_orders數據框將保持不變。如何在我的期望日期範圍內重新編制像Priozhenko的df這樣的數據框?這是否按照日期排序? – adisciu

相關問題