將轉換應用於Python熊貓中具有多級索引的數據框

我試圖將簡單函數應用於大多數熊貓數字數據。數據是一組按時間索引的矩陣。我想使用分層/多層次索引來表示這一點，然後使用類似操作的拆分應用組合來分組數據，應用操作並將結果匯總爲數據框。我希望這些操作的結果是數據框而不是Series對象。將轉換應用於Python熊貓中具有多級索引的數據框

下面是兩個矩陣（兩個時間點）表示爲多級數據幀的簡單示例。我想從每個時間點減去一個矩陣，然後通過取平均值來摺疊數據，並取回保留數據原始列名稱的數據幀。

我嘗試的一切都失敗或給出了一個奇怪的結果。我試圖遵循http://pandas.pydata.org/pandas-docs/stable/groupby.html，因爲這基本上是一個拆分應用組合操作，我認爲，但文檔很難理解，並且這些示例很密集。

大熊貓如何做到這一點？我註釋在我的代碼一起相關的行失敗：

import pandas 
import numpy as np 

t1 = pandas.DataFrame([[0, 0, 0], 
         [0, 1, 1], 
         [5, 5, 5]], columns=[1, 2, 3], index=["A", "B", "C"]) 
t2 = pandas.DataFrame([[10, 10, 30], 
         [5, 1, 1], 
         [2, 2, 2]], columns=[1, 2, 3], index=["A", "B", "C"]) 
m = np.ones([3,3]) 
c = pandas.concat([t1, t2], keys=["t1", "t2"], names=["time", "name"]) 
#print "c: ", c 

# How to view just the 'time' column values? 
#print c.ix["time"] # fails 
#print c["time"] # fails 

# How to group matrix by time, subtract value from each matrix, and then 
# take the mean across the columns and get a dataframe back? 
result = c.groupby(level="time").apply(lambda x: np.mean(x - m, axis=1)) 

# Why does 'result' appear to have TWO "time" columns?! 
print result 

# Why is 'result' a series and not a dataframe? 
print type(result) 

# Attempt to get a dataframe back 
df = pandas.DataFrame(result) 

# Why does 'df' have a weird '0' outer (hierarchical) column?? 
print df 
#       0 
# time time name 
# t1 t1 A  -1.000000 
#   B  -0.333333 
#   C  4.000000 
# t2 t2 A  15.666667 
#   B  1.333333 
#   C  1.000000

總之，我想要做的操作：

for each time point: 
    subtract m from time point matrix 
    collapse the result matrix across the columns by taking the mean (preserving the row labels "A", "B", "C" 
return result as dataframe

來源

2015-06-02 lgd

如何看待只是「時間」列的值？

In [11]: c.index.levels[0].values 
Out[11]: array(['t1', 't2'], dtype=object)

如何組矩陣的時間，從每個矩陣減去值，然後取均值跨列，並得到一個數據幀回來？

你的嘗試是相當接近：

In [46]: c.groupby(level='time').apply(lambda x: x - m).mean(axis=1) 
Out[46]: 
time name 
t1 A  -1.000000 
     B  -0.333333 
     C  4.000000 
t2 A  15.666667 
     B  1.333333 
     C  1.000000 
dtype: float64

來源

2015-06-02 23:00:00 chrisaycock

感謝，但你的解決方案仍然會返回一個系列。我想有一個數據框回來。你也可以解釋你的電話和我的區別嗎？如果我嘗試'''pandas.DataFrame（c.groupby（level ='time'）.application（lambda x：x -m）.mean（axis = 1））'''我仍然得到奇怪的額外列「0 「作爲外部列/索引 – lgd

@lgd沒有明確的列名稱，DataFrame將獲得默認列名'0'。您可以通過'pandas.DataFrame（...，columns = ['something_here']）'提供一個名稱。 – chrisaycock

定義數據框「t1」和「t2」時，我給列指定名稱[1,2,3] – lgd

將轉換應用於Python熊貓中具有多級索引的數據框

回答

相關問題