2013-02-02 71 views
72

我有熊貓據幀df1df2(DF1被vanila據幀,DF2由 'STK_ID' & 'RPT_Date' 索引):如何獲取最後一行的熊貓數據框?

>>> df1 
    STK_ID RPT_Date TClose sales discount 
0 000568 20060331 3.69 5.975  NaN 
1 000568 20060630 9.14 10.143  NaN 
2 000568 20060930 9.49 13.854  NaN 
3 000568 20061231 15.84 19.262  NaN 
4 000568 20070331 17.00 6.803  NaN 
5 000568 20070630 26.31 12.940  NaN 
6 000568 20070930 39.12 19.977  NaN 
7 000568 20071231 45.94 29.269  NaN 
8 000568 20080331 38.75 12.668  NaN 
9 000568 20080630 30.09 21.102  NaN 
10 000568 20080930 26.00 30.769  NaN 

>>> df2 
       TClose sales discount net_sales cogs 
STK_ID RPT_Date            
000568 20060331 3.69 5.975  NaN  5.975 2.591 
     20060630 9.14 10.143  NaN  10.143 4.363 
     20060930 9.49 13.854  NaN  13.854 5.901 
     20061231 15.84 19.262  NaN  19.262 8.407 
     20070331 17.00 6.803  NaN  6.803 2.815 
     20070630 26.31 12.940  NaN  12.940 5.418 
     20070930 39.12 19.977  NaN  19.977 8.452 
     20071231 45.94 29.269  NaN  29.269 12.606 
     20080331 38.75 12.668  NaN  12.668 3.958 
     20080630 30.09 21.102  NaN  21.102 7.431 

我可以拿到最後3行DF2的:

>>> df2.ix[-3:] 
       TClose sales discount net_sales cogs 
STK_ID RPT_Date            
000568 20071231 45.94 29.269  NaN  29.269 12.606 
     20080331 38.75 12.668  NaN  12.668 3.958 
     20080630 30.09 21.102  NaN  21.102 7.431 

df1.ix[-3:]給所有行:

>>> df1.ix[-3:] 
    STK_ID RPT_Date TClose sales discount 
0 000568 20060331 3.69 5.975  NaN 
1 000568 20060630 9.14 10.143  NaN 
2 000568 20060930 9.49 13.854  NaN 
3 000568 20061231 15.84 19.262  NaN 
4 000568 20070331 17.00 6.803  NaN 
5 000568 20070630 26.31 12.940  NaN 
6 000568 20070930 39.12 19.977  NaN 
7 000568 20071231 45.94 29.269  NaN 
8 000568 20080331 38.75 12.668  NaN 
9 000568 20080630 30.09 21.102  NaN 
10 000568 20080930 26.00 30.769  NaN 

爲什麼?如何獲得最後3行df1(沒有索引的數據框)? 熊貓0.10.1

+3

你可以使用'df [-3:]'產生你想要的結果。這被WesM的一個bug解決了。不知道是否/當它得到解決:http://stackoverflow.com/questions/14035817/slicing-pandas-dataframe-with-negative-index-with-ix-method – Zelazny7

+0

感謝您的信息 – bigbug

+0

@ Zelazny7你可以使用irows (整數行?)更直觀地做到這一點。對於負整數索引的DataFrame,'df [-3:]'行爲** **瘋狂**。 –

回答

194

別忘了DataFrame.tail!例如df1.tail(10)

32

這是因爲使用整數索引(ix選擇那些由標籤超過-3而非位置,並且這是由設計:見integer indexing in pandas "gotchas" *)。

*在大熊貓的新版本更喜歡祿或ILOC刪除IX的歧義位置或標籤:

df.iloc[-3:] 

看到docs

正如韋斯指出的,在這個特定的情況下,你應該只使用尾巴!

還應當指出的是,熊貓前0.14 iloc將提高對一個徹頭徹尾的越界訪問的IndexError,而.head().tail()不會:

>>> pd.__version__ 
'0.12.0' 
>>> df = pd.DataFrame([{"a": 1}, {"a": 2}]) 
>>> df.iloc[-5:] 
... 
IndexError: out-of-bounds on slice (end) 
>>> df.tail(5) 
    a 
0 1 
1 2 

老答案(貶值法):

可以使用irows數據幀的方法來克服這種不確定性:

In [11]: df1.irow(slice(-3, None)) 
Out[11]: 
    STK_ID RPT_Date TClose sales discount 
8  568 20080331 38.75 12.668  NaN 
9  568 20080630 30.09 21.102  NaN 
10  568 20080930 26.00 30.769  NaN 

注:系列有a similar iget method

+0

@ DavidWolever我無法在0.14.1上重現您的IndexError,df.iloc [-5:]可以在您的示例中正常工作。你使用哪種版本的熊貓? –

相關問題