2013-07-01 58 views
8

我對R和Python Pandas有所瞭解。我試圖索引DataFrame來檢索滿足一系列邏輯條件的行 - 很像SQL的「where」語句。我知道如何在R中使用數據框(以及R的data.table包,它比R的本地數據幀更像是Pandas DataFrame)執行此操作。索引有多個條件的Python Pandas數據框SQL like where語句

下面是構建DataFrame的一些示例代碼以及我想如何索引它的說明。是否有捷徑可尋?

import pandas as pd 
import numpy as np 

# generate some data 
mult = 10000 
fruits = ['Apple', 'Banana', 'Kiwi', 'Grape', 'Orange', 'Strawberry']*mult 
vegetables = ['Asparagus', 'Broccoli', 'Carrot', 'Lettuce', 'Rutabaga', 'Spinach']*mult 
animals = ['Dog', 'Cat', 'Bird', 'Fish', 'Lion', 'Mouse']*mult 
xValues = np.random.normal(loc=80, scale=2, size=6*mult) 
yValues = np.random.normal(loc=79, scale=2, size=6*mult) 

data = {'Fruit': fruits, 
     'Vegetable': vegetables, 
     'Animal': animals, 
     'xValue': xValues, 
     'yValue': yValues,} 

df = pd.DataFrame(data) 

# shuffle the columns to break structure of repeating fruits, vegetables, animals 
np.random.shuffle(df.Fruit) 
np.random.shuffle(df.Vegetable) 
np.random.shuffle(df.Animal) 

df.head(30) 

# filter sets 
fruitsInclude = ['Apple', 'Banana', 'Grape'] 
vegetablesExclude = ['Asparagus', 'Broccoli'] 

# subset1: All rows and columns where: 
# (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude) 

# subset2: All rows and columns where: 
# (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')] 

# subset3: All rows and specific columns where above logical conditions are true. 

所有幫助和輸入歡迎和高度讚賞!

感謝, 蘭德爾

+0

哇。正是我需要的。感謝您的快速和直接的答案。請注意,我拼寫的蔬菜包括錯誤...應該已經被蔬菜排除了(與c)。在上面的代碼中更正了它,所以應該複製並粘貼來測試。再次感謝。蘭德爾。 – user2537610

回答

14
# subset1: All rows and columns where: 
# (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude) 
df.ix[df['Fruit'].isin(fruitsInclude) & ~df['Vegetable'].isin(vegetablesExclude)] 

# subset2: All rows and columns where: 
# (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')] 
df.ix[df['Fruit'].isin(fruitsInclude) & (~df['Vegetable'].isin(vegetablesExclude) | (df['Animal']=='Dog'))] 

# subset3: All rows and specific columns where above logical conditions are true. 
df.ix[df['Fruit'].isin(fruitsInclude) & ~df['Vegetable'].isin(vegetablesExclude) & (df['Animal']=='Dog')] 
+0

勉強打敗我吧!這是我想出的完全相同的解決方案+1 – spencerlyon2

+0

如果我只想要索引,是否有比這更短的方式:'df.ix [df ['Fruit']。isin(fruitsInclude).index' – Rhubarb

+0

@ Zhubarb:'df.index [df ['Fruit']。isin(fruitsInclude)]'比我的機器更短,(在我的機器上〜33%)快於'df.ix [df ['Fruit']。isin(fruitsInclude)] .index'。 – unutbu