2017-01-17 65 views
1

我想獲取另一列的值基於某一列中的值,在同一行。pandas df定位只保留第一項

例如:

業務ID = '123',我要檢索的BUSINESS_NAME

DF:

biz_id biz_name 
123  chew 
456  bite 
123  chew 

代碼:

df['biz_name'].loc[df['biz_id'] == 123] 

返回我:

chew 
chew 

如何獲得字符串格式的'chew'的1個值?

回答

1

您可以使用ilociatSeries選擇第一個值:

print (df.loc[df['biz_id'] == 123, 'biz_name'].iloc[0]) 
chew 

或者:

print (df.loc[df['biz_id'] == 123, 'biz_name'].iat[0]) 
chew 

隨着query

print (df.query('biz_id == 123')['biz_name'].iloc[0]) 
chew 

或者在list選擇第一個值或numpy array

print (df.loc[df['biz_id'] == 123, 'biz_name'].tolist()[0]) 
chew 

print (df.loc[df['biz_id'] == 123, 'biz_name'].values[0]) 
chew 

時序

In [18]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].iloc[0]) 
1000 loops, best of 3: 399 µs per loop 

In [19]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].iat[0]) 
The slowest run took 4.16 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 391 µs per loop 

In [20]: %timeit (df.query('biz_id == 123')['biz_name'].iloc[0]) 
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 1.75 ms per loop 

In [21]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].tolist()[0]) 
The slowest run took 4.18 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 384 µs per loop 

In [22]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].values[0]) 
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 370 µs per loop 

In [23]: %timeit (df.loc[df.biz_id.eq(123).idxmax(), 'biz_name']) 
1000 loops, best of 3: 517 µs per loop 
2

使用idxmax搶到第一最大值指數

df.loc[df.biz_id.eq(123).idxmax(), 'biz_name'] 

'chew'