我想你需要添加,:
到底 - 這意味着你需要切片行,但需要所有列:
print (df.loc[('chr1', slice(3000714, 3001110)),:])
end ref|alt
chrom start
chr1 3000714 3000715 T|G
3001065 3001066 G|T
3001110 3001111 G|C
另一種解決方案是增加axis=0
到loc
:
print (df.loc(axis=0)[('chr1', slice(3000714, 3001110))])
end ref|alt
chrom start
chr1 3000714 3000715 T|G
3001065 3001066 G|T
3001110 3001111 G|C
但如果只需要3000714
和3001110
:
print (df.loc[('chr1', [3000714, 3001110]),:])
end ref|alt
chrom start
chr1 3000714 3000715 T|G
3001110 3001111 G|C
idx = pd.IndexSlice
print (df.loc[idx['chr1', [3000714, 3001110]],:])
end ref|alt
chrom start
chr1 3000714 3000715 T|G
3001110 3001111 G|C
個
時序:
In [21]: %timeit (df.loc[('chr1', slice(3000714, 3001110)),:])
1000 loops, best of 3: 757 µs per loop
In [22]: %timeit (df.loc(axis=0)[('chr1', slice(3000714, 3001110))])
1000 loops, best of 3: 743 µs per loop
In [23]: %timeit (df.loc[('chr1', [3000714, 3001110]),:])
1000 loops, best of 3: 824 µs per loop
In [24]: %timeit (df.loc[pd.IndexSlice['chr1', [3000714, 3001110]],:])
The slowest run took 5.35 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 826 µs per loop
好極了,那完美。感謝您的好解釋。我也意識到,對於我這裏的情況,因爲我的一級指數比第二級指數要小得多(「level [0]」指數中有23項,'level [1]'指數中有1260萬項),所以我通過將數據幀分解爲第一個索引中的字典,獲得了更快的速度。在我的完整數據框中,'df.loc(axis = 0)[('chr1',slice(3000714,3001110))]'方法每循環花費218毫秒,而製作字典並執行'dfs ['chr1'] .loc [3000714:3001110]每個循環只需要95.7μs。再次感謝! –
@jezrael,我將如何選擇一個數據幀從一個索引到另一個..在該範圍..我有函數users.index = np.arange(0,len(用戶))這是什麼都沒有返回...用戶。 loc [start:end:]空的數據框,但users.dataframe包含內容 – Eliethesaiyan