我就類似於multiindexed數據幀運行groupby
操作這一個:多指標大熊貓groupby,忽略一個級別?
0 1 ...
categories features subfeatures
cat1 feature1 subfeature1 -0.224487 -0.227524
subfeature2 -0.591399 -0.799228
feature2 subfeature1 1.190110 -1.365895 ...
subfeature2 0.720956 -1.325562
cat2 feature1 subfeature1 1.856932 NaN
subfeature2 -1.354258 -0.740473
feature2 subfeature1 0.234075 -1.362235 ...
subfeature2 0.013875 1.309564
cat3 feature1 subfeature1 NaN NaN
subfeature2 -1.260408 1.559721 ...
feature2 subfeature1 0.419246 0.084386
subfeature2 0.969270 1.493417
... ... ...
它可以使用下面的代碼生成:
import pandas as pd, numpy as np
np.random.seed(seed=90)
results = np.random.randn(3,2,2,2)
results[2,0,0,:] = np.nan
results[1,0,0,1] = np.nan
results = results.reshape((-1,2))
index = pd.MultiIndex.from_product([["cat1", "cat2", "cat3"],
["feature1", "feature2"],
["subfeature1", "subfeature2"]],
names=["categories", "features", "subfeatures"])
df = pd.DataFrame(results, index=index)
我試圖只選擇組兩個子特徵陣列之間的最大差異大於某個閾值,但我遇到問題groupby
df.groupby(level=['categories','features'])
這給了我以下組:
{('cat1', 'feature1'): [('cat1', 'feature1', 'subfeature1'),
('cat1', 'feature1', 'subfeature2')],
('cat1', 'feature2'): [('cat1', 'feature2', 'subfeature1'),
('cat1', 'feature2', 'subfeature2')],
('cat2', 'feature1'): [('cat2', 'feature1', 'subfeature1'),
('cat2', 'feature1', 'subfeature2')],
('cat2', 'feature2'): [('cat2', 'feature2', 'subfeature1'),
('cat2', 'feature2', 'subfeature2')],
('cat3', 'feature1'): [('cat3', 'feature1', 'subfeature1'),
('cat3', 'feature1', 'subfeature2')],
('cat3', 'feature2'): [('cat3', 'feature2', 'subfeature1'),
('cat3', 'feature2', 'subfeature2')]}
有沒有什麼辦法來組,以便子功能級別由groupby
函數忽略?原因是我需要subfeature1
和subfeature2
在一起,在分開的小組中它們毫無價值。
所以最好我想在groupby
返回是這樣的:
{('cat1', 'feature1'): [('cat1', 'feature1')],
('cat1', 'feature2'): [('cat1', 'feature2')],
('cat2', 'feature1'): [('cat2', 'feature1')],
('cat2', 'feature2'): [('cat2', 'feature2')],
('cat3', 'feature1'): [('cat3', 'feature1')],
('cat3', 'feature2'): [('cat3', 'feature2')],
我怎麼能這樣做?
有沒有可能在值之間有重複?例如,'('cat1','feature1')'在值列表中包含兩次。 – tlnagy
你在做什麼?你幾乎不需要直接使用''.groups''。他們不是蠢人,每組有兩排。 – Jeff
我正在比較子數列右側的數字數組。我想比較'subfeature1'數組和'subfeature2'數組每次(貓,特徵)組。 – tlnagy