Python的大熊貓：不能做切片索引

我想用大熊貓多指標數據框，看起來像這樣的工作：Python的大熊貓：不能做切片索引

    end ref|alt 
chrom start 
chr1 3000714 3000715  T|G 
     3001065 3001066  G|T 
     3001110 3001111  G|C 
     3001131 3001132  G|A

我希望能夠做到這一點：

df.loc[('chr1', slice(3000714, 3001110))]

失敗並出現以下錯誤：

cannot do slice indexing on with these indexers [1204741] of

df.index.levels[1].dtype回報dtype('int64')，所以應該使用整數片對嗎？

另外，任何有關如何做到這一點有關的意見將是有價值的，因爲數據幀有1200萬行，我需要查詢它與這種切片查詢〜7000萬次。

來源

2016-06-09 Mike Dacre

我想你需要添加,:到底 - 這意味着你需要切片行，但需要所有列：

print (df.loc[('chr1', slice(3000714, 3001110)),:]) 
        end ref|alt 
chrom start     
chr1 3000714 3000715  T|G 
     3001065 3001066  G|T 
     3001110 3001111  G|C

另一種解決方案是增加axis=0到loc：

print (df.loc(axis=0)[('chr1', slice(3000714, 3001110))]) 
        end ref|alt 
chrom start     
chr1 3000714 3000715  T|G 
     3001065 3001066  G|T 
     3001110 3001111  G|C

但如果只需要3000714和3001110：

print (df.loc[('chr1', [3000714, 3001110]),:]) 
        end ref|alt 
chrom start     
chr1 3000714 3000715  T|G 
     3001110 3001111  G|C 

idx = pd.IndexSlice 
print (df.loc[idx['chr1', [3000714, 3001110]],:]) 
        end ref|alt 
chrom start     
chr1 3000714 3000715  T|G 
     3001110 3001111  G|C

個

時序：

In [21]: %timeit (df.loc[('chr1', slice(3000714, 3001110)),:]) 
1000 loops, best of 3: 757 µs per loop 

In [22]: %timeit (df.loc(axis=0)[('chr1', slice(3000714, 3001110))]) 
1000 loops, best of 3: 743 µs per loop 

In [23]: %timeit (df.loc[('chr1', [3000714, 3001110]),:]) 
1000 loops, best of 3: 824 µs per loop 

In [24]: %timeit (df.loc[pd.IndexSlice['chr1', [3000714, 3001110]],:]) 
The slowest run took 5.35 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 826 µs per loop

來源

2016-06-09 05:06:59 jezrael

好極了，那完美。感謝您的好解釋。我也意識到，對於我這裏的情況，因爲我的一級指數比第二級指數要小得多（「level [0]」指數中有23項，'level [1]'指數中有1260萬項），所以我通過將數據幀分解爲第一個索引中的字典，獲得了更快的速度。在我的完整數據框中，'df.loc（axis = 0）[（'chr1'，slice（3000714，3001110））]'方法每循環花費218毫秒，而製作字典並執行'dfs ['chr1'] .loc [3000714：3001110]每個循環只需要95.7μs。再次感謝！ –

@jezrael，我將如何選擇一個數據幀從一個索引到另一個..在該範圍..我有函數users.index = np.arange（0，len（用戶））這是什麼都沒有返回...用戶。 loc [start：end：]空的數據框，但users.dataframe包含內容 – Eliethesaiyan

Python的大熊貓：不能做切片索引

回答

相關問題