1
我的數據中有很多嵌套。我有6個時間段(但我們不用擔心),每個時間段有19個分位數,每個分位數有一個51x51協方差矩陣(對於美國的所有狀態和DC)。如果以字典的形式表示,我會:將數據框追加到索引熊貓
my_data = {'time_pd_1' : {0.05 : pd.DataFrame(data=cov_var(data_for_0.05), columns=states, index=states),
{0.10 : pd.DataFrame(data=cov_var(data_for_0.10), columns=states, index=states),
...
{0.90 : pd.DataFrame(data=cov_var(data_for_0.90), columns=states, index=states),
{0.95 : pd.DataFrame(data=cov_var(data_for_0.95), columns=states, index=states)},
'time_pd_2' : {0.05 : pd.DataFrame(data=cov_var(data_for_0.05), columns=states, index=states),
{0.10 : pd.DataFrame(data=cov_var(data_for_0.10), columns=states, index=states),
...
{0.90 : pd.DataFrame(data=cov_var(data_for_0.90), columns=states, index=states),
{0.95 : pd.DataFrame(data=cov_var(data_for_0.95), columns=states, index=states)},
...
'time_pd_6' : {0.05 : pd.DataFrame(data=cov_var(data_for_0.05), columns=states, index=states),
{0.10 : pd.DataFrame(data=cov_var(data_for_0.10), columns=states, index=states),
...
{0.90 : pd.DataFrame(data=cov_var(data_for_0.90), columns=states, index=states),
{0.95 : pd.DataFrame(data=cov_var(data_for_0.95), columns=states, index=states)}}
夠簡單,但數據不是這樣創建的。我有兩個for
循環,即做的工作:
for tpd in time_periods:
for q in quantiles:
tdf = pd.DataFrame(data=cov_var(data_for_q), index=states, columns=states)
如果我打印tdf
它看起來像這樣:
ST Alabama Alaska Arizona ... West Virginia Wisconsin Wyoming
ST
Alabama 288.867628 50.000000 -100.062576 ... 37.719317 0 -75.000000
Alaska 50.000000 280.929272 -229.365427 ... 57.514555 0 -136.365512
Arizona -100.062576 -229.365427 946.563177 ... -113.805612 0 291.897723
... ... ... ... ... ... ... ...
West Virginia 37.719317 57.514555 -113.805612 ... 342.195976 0 -214.243277
Wisconsin 0.000000 0.000000 0.000000 ... 0.000000 0 0.000000
Wyoming -75.000000 -136.365512 291.897723 ... -214.243277 0 684.146619
現在,我想是這樣的:
cov = {}
for tpd in time_periods:
cov[tpd] = pd.DataFrame(index=[str(round(q,2)) for q in quantiles])
for q in quantiles:
tdf = pd.DataFrame(data=cov_var(data_for_q), index=states, columns=states)
cov[tpd].loc[str(round(q,2)), :] = tdf
所以如果我打印cov[tpd]
它應該看起來像:
ST Alabama Alaska Arizona ... West Virginia Wisconsin Wyoming
q ST
Alabama 288.867628 50.000000 -100.062576 ... 37.719317 0 -75.000000
Alaska 50.000000 280.929272 -229.365427 ... 57.514555 0 -136.365512
Arizona -100.062576 -229.365427 946.563177 ... -113.805612 0 291.897723
0.05 ... ... ... ... ... ... ... ...
West Virginia 37.719317 57.514555 -113.805612 ... 342.195976 0 -214.243277
Wisconsin 0.000000 0.000000 0.000000 ... 0.000000 0 0.000000
Wyoming -75.000000 -136.365512 291.897723 ... -214.243277 0 684.146619
Alabama 288.867628 50.000000 -100.062576 ... 37.719317 0 -75.000000
Alaska 50.000000 280.929272 -229.365427 ... 57.514555 0 -136.365512
Arizona -100.062576 -229.365427 946.563177 ... -113.805612 0 291.897723
0.10 ... ... ... ... ... ... ... ...
West Virginia 37.719317 57.514555 -113.805612 ... 342.195976 0 -214.243277
Wisconsin 0.000000 0.000000 0.000000 ... 0.000000 0 0.000000
Wyoming -75.000000 -136.365512 291.897723 ... -214.243277 0 684.146619
... ... ... ... ... ... ... ... ...
... ... ... ... ... ... ... ... ...
Alabama 288.867628 50.000000 -100.062576 ... 37.719317 0 -75.000000
Alaska 50.000000 280.929272 -229.365427 ... 57.514555 0 -136.365512
Arizona -100.062576 -229.365427 946.563177 ... -113.805612 0 291.897723
0.90 ... ... ... ... ... ... ... ...
West Virginia 37.719317 57.514555 -113.805612 ... 342.195976 0 -214.243277
Wisconsin 0.000000 0.000000 0.000000 ... 0.000000 0 0.000000
Wyoming -75.000000 -136.365512 291.897723 ... -214.243277 0 684.146619
Alabama 288.867628 50.000000 -100.062576 ... 37.719317 0 -75.000000
Alaska 50.000000 280.929272 -229.365427 ... 57.514555 0 -136.365512
Arizona -100.062576 -229.365427 946.563177 ... -113.805612 0 291.897723
0.95 ... ... ... ... ... ... ... ...
West Virginia 37.719317 57.514555 -113.805612 ... 342.195976 0 -214.243277
Wisconsin 0.000000 0.000000 0.000000 ... 0.000000 0 0.000000
Wyoming -75.000000 -136.365512 291.897723 ... -214.243277 0 684.146619
擁有這個最終結構將使我的生活變得如此簡單,我願意爲獲得它的人購買啤酒。這之餘,我已經試過各種事情:
cov[tpd].loc[str(round(q,2)), :] = tdf # Raises ValueError: Incompatible indexer with DataFrame
cov[tpd].loc[str(round(q,2)), :].append(tdf) # Almost gives me the frame I need, but removes the index level q, and inserts a column 0 with NaNs
cov[tpd].loc[str(round(q,2)), :].join(tdf, how='outer') # Raises AttributeError: 'Series' object has no attribute 'join'
pd.merge(cov[tpd].loc[str(round(q,2)), :], tdf, how='outer') # Raises AttributeError: 'Series' object has no attribute 'columns'
我瞭解所有的錯誤消息,並且我也有涉及到一個可能的解決預創建的數據幀cov[tpd]
我想它,然後使用索引,以插入的方式從cov_var()
輸出。但是,這是一些額外的代碼行,用於創建cov[tpd]
的多索引,然後插入數據。有誰知道更好的方法?
注:cov_var()
是我寫的,因爲我的情況有點特殊的一個簡單的協方差計算功能,我不能使用內置函數一樣np.cov()
。