2015-06-19 48 views
2

我有一個數據幀,目前看起來如下,有262800行和3列。我的數據幀是目前如下:重構數據幀

 Currency Maturity  value 
0   GBP 0.08333333 4.709456 
1   GBP 0.08333333 4.713099 
2   GBP 0.08333333 4.707237 
3   GBP 0.08333333 4.705043 
4   GBP 0.08333333 4.697150 
5   GBP 0.08333333 4.710647 
6   GBP 0.08333333 4.701150 
7   GBP 0.08333333 4.694639 
8   GBP 0.08333333 4.686111 
9   GBP 0.08333333 4.714750 
...... 
262770  GBP   25 2.432869 

我想數據幀是下面的表格中。我已經採取了一些措施,包括在下面的代碼中使用melt,但由於某種原因,擺脫了我的Date列,並導致上面的數據框。我不確定如何獲取日期欄後面,並獲得以下數據框:

Maturity  Date   Currency Yield_pct 
0 0.08333333 2005-01-04  GBP  4.709456    
1 0.08333333 2005-01-05  GBP  4.713099    
2 0.08333333 2005-01-06  GBP  4.707237 
.... 
9 25   2005-01-04  GBP  2.432869 

我的代碼如下:

from pandas.io.excel import read_excel 
import pandas as pd 
import numpy as np 

url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls' 

# check the sheet number, spot: 9/9, short end 7/9 
spot_curve = read_excel(url, sheetname=8) 
short_end_spot_curve = read_excel(url, sheetname=6) 

# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve 
spot_curve.columns = spot_curve.loc['years:'] 
spot_curve.columns.name = 'Maturity' 
valid_index = spot_curve.index[4:] 
spot_curve = spot_curve.loc[valid_index] 
# remove all maturities within 5 years as those are duplicated in short-end file 
col_mask = spot_curve.columns.values > 5 
spot_curve = spot_curve.iloc[:, col_mask] 


short_end_spot_curve.columns = short_end_spot_curve.loc['years:'] 
short_end_spot_curve.columns.name = 'Maturity' 
valid_index = short_end_spot_curve.index[4:] 
short_end_spot_curve = short_end_spot_curve.loc[valid_index] 

# merge these two, time index are identical 
# ============================================== 
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer') 
# sort the maturity from short end to long end 
combined_data.sort_index(axis=1, inplace=True) 

def filter_func(group): 
    return group.isnull().sum(axis=1) <= 50 

combined_data = combined_data.groupby(level=0).filter(filter_func) 

idx = 0 
values = ['GBP'] * len(combined_data.index) 
combined_data.insert(idx, 'Currency', values) 

#print combined_data.columns.values 

#I had to do the melt 
combined_data = pd.melt(combined_data,id_vars=['Currency'])#Arbitrarily melted on 'Currency' as for some reason when I do print combined_data.columns.values I see that 'Currency' corresponds to 0.08333333, etc. 
print combined_data 

回答

2

你能不能添加以下的melt貨幣標識?

# Copy up to this stage 
combined_data = combined_data.groupby(level=0).filter(filter_func) 

# My code from here 
combined_data.reset_index(inplace=True, drop=False) 
combined_data.rename(columns={'index': 'Date'}, inplace=True) 

# This line assumes you want datetime, ignore if you don't 
combined_data['Date'] = pd.to_datetime(combined_data['Date']) 

result = pd.melt(combined_data, id_vars=['Date']) 

result['Currency'] = 'GBP' 

result.head()

Date Maturity value Currency 
0 2005-01-04 0.08333333 4.709456 GBP 
1 2005-01-05 0.08333333 4.713099 GBP 
2 2005-01-06 0.08333333 4.707237 GBP 
3 2005-01-07 0.08333333 4.705043 GBP 
4 2005-01-10 0.08333333 4.697150 GBP 
+0

太好了。我可以再問一個問題嗎?有沒有辦法將列名'value'改爲'Yield_pct'? – Jojo

+3

當然,我個人喜歡使用字典,因爲很容易看出它以前是什麼:''result.rename(columns = {'value':'Yield_pct'},inplace = True)'' – bastewart

0

輸出嘗試後第一重置你的指數包括貨幣堆積的結果。

cd = combined_data.reset_index().set_index(['index', 'Currency']) 
cd_new = cd.stack() 
>>> cd_new 
index  Currency Maturity 
2005-01-04 GBP  0.083333 4.709456 
         0.166667 4.633861 
         0.250000 4.586271 
         0.333333 4.567017 
         0.416667 4.559578 
         0.500000 4.553227 
         0.583333 4.543976 
         0.666667 4.530881 
         0.750000 4.514742 
         0.833333 4.497187 
         0.916667 4.479690 
         1.000000 4.463105 
         1.083333 4.447843 
         1.166667 4.434076 
         1.250000 4.421868 
... 
2015-05-29 GBP  18.0  2.453898 
         18.5  2.475052 
         19.0  2.494679 
         19.5  2.512787 
         20.0  2.529393 
         20.5  2.544519 
         21.0  2.558198 
         21.5  2.570467 
         22.0  2.581368 
         22.5  2.590947 
         23.0  2.599250 
         23.5  2.606327 
         24.0  2.612229 
         24.5  2.617008 
         25.0  2.620715 
Length: 259457, dtype: float64 

cd_new.xs('2015-05-29') 
Currency Maturity 
GBP  0.333333 0.452339 
      0.416667 0.441134 
      0.500000 0.430168 
      0.583333 0.419990 
      0.666667 0.411208 
      0.750000 0.404424 
      0.833333 0.400017 
      0.916667 0.398140 
      1.000000 0.398806 
      1.083333 0.401943 
      1.166667 0.407427 
      1.250000 0.415095 
      1.333333 0.424762 
      1.416667 0.436233 
      1.500000 0.449322 
... 
GBP  18.0  2.453898 
      18.5  2.475052 
      19.0  2.494679 
      19.5  2.512787 
      20.0  2.529393 
      20.5  2.544519 
      21.0  2.558198 
      21.5  2.570467 
      22.0  2.581368 
      22.5  2.590947 
      23.0  2.599250 
      23.5  2.606327 
      24.0  2.612229 
      24.5  2.617008 
      25.0  2.620715 
Length: 97, dtype: float64