Pandas MultiIndex查找與Numpy數組

我正在使用代表圖形的pandas DataFrame。數據幀由指示節點端點的MultiIndex索引。Pandas MultiIndex查找與Numpy數組

設置：

import pandas as pd 
import numpy as np 
import itertools as it 
edges = list(it.combinations([1, 2, 3, 4], 2)) 

# Define a dataframe to represent a graph 
index = pd.MultiIndex.from_tuples(edges, names=['u', 'v']) 
df = pd.DataFrame.from_dict({ 
    'edge_id': list(range(len(edges))), 
    'edge_weight': np.random.RandomState(0).rand(len(edges)), 
}) 
df.index = index 
print(df) 
## -- End pasted text -- 
    edge_id edge_weight 
u v      
1 2  0  0.5488 
    3  1  0.7152 
    4  2  0.6028 
2 3  3  0.5449 
    4  4  0.4237 
3 4  5  0.6459

我希望能夠索引到使用邊子集的圖形，這就是爲什麼我選擇使用MultiIndex。只要輸入到df.loc是元組列表，我就可以做到這一點。

# Select subset of graph using list-of-tuple indexing 
edge_subset1 = [edges[x] for x in [0, 3, 2]] 
df.loc[edge_subset1] 
## -- End pasted text -- 
    edge_id edge_weight 
u v      
1 2  0  0.5488 
2 3  3  0.5449 
1 4  2  0.6028

然而，當我邊的列表是numpy的數組（因爲它往往是），或列表的列表，然後我似乎無法使用df.loc屬性。

# Why can't I do this if `edge_subset2` is a numpy array? 
edge_subset2 = np.array(edge_subset1) 
df.loc[edge_subset2] 
## -- End pasted text -- 
TypeError: unhashable type: 'numpy.ndarray'

這將是確定的，如果我可以全部arr.tolist()，但這會導致一個看似不同的錯誤。

# Why can't I do this if `edge_subset2` is a numpy array? 
# or if `edge_subset3` is a list-of-lists? 
edge_subset3 = edge_subset2.tolist() 
df.loc[edge_subset3] 
## -- End pasted text -- 
TypeError: '[1, 2]' is an invalid key

每次我想選擇一個子集時必須使用list(map(tuple, arr.tolist()))真的很痛苦。如果有另一種方法可以做到這一點，那將會很好。

主要questsions是：

爲什麼我不能用同一個.loc數組numpy的？是否因爲在引擎蓋下正在使用字典將多索引標籤映射到位置索引？
爲什麼列表列表給出了不同的錯誤？也許它真的是同樣的問題，它只是採取了不同的方式？
是否有另一種（理想情況下較少冗餘）的方式來查找一個數據框的子集與我不知道的多索引標籤的numpy數組？

來源

2017-01-05 Erotemic

請注意，'df.edge_id [edge_subset2]'的作品 - 這意味着這種索引風格由於某種原因在系列而不是數據幀上受支持。奇怪的是，'df.edge_id.loc [edge_subset2]'也失敗了（無緣無故，因爲它沒有'loc'）。我建議在這裏將它提交給熊貓：https://github.com/pandas-dev/pandas/issues –

字典鍵是不可變的，這就是爲什麼你不能使用列表的列表來訪問多索引。

爲了能夠使用loc訪問多索引數據，您需要將您的numpy數組轉換爲元組列表;元組是不可變的，這樣做，正如你所提到

使用map如果你想避免使用地圖和您正在閱讀的邊緣形成一個CSV文件，你可以閱讀到一個數據幀，然後使用to_records用一種方式在index屬性設置爲False，另一種方式可能是通過建立從ndarray一個多指標，但你必須通過它使每個級別的陣列

import pandas as pd 

df1 = df.loc[pd.MultiIndex.from_arrays(edge_subset2.T)] 


print(df1) 

#outputs 
      edge_id edge_weight 
------ --------- ------------- 
(1, 2)   0  0.548814 
(2, 3)   3  0.544883 
(1, 4)   2  0.602763

我發現了一個列表之前轉列表文章advanced multi-indexing在熊貓文檔中很有幫助

來源

2017-01-31 13:33:02 sgDysregulation

Pandas MultiIndex查找與Numpy數組

回答

相關問題