我對R和Python Pandas有所瞭解。我試圖索引DataFrame來檢索滿足一系列邏輯條件的行 - 很像SQL的「where」語句。我知道如何在R中使用數據框(以及R的data.table包,它比R的本地數據幀更像是Pandas DataFrame)執行此操作。索引有多個條件的Python Pandas數據框SQL like where語句
下面是構建DataFrame的一些示例代碼以及我想如何索引它的說明。是否有捷徑可尋?
import pandas as pd
import numpy as np
# generate some data
mult = 10000
fruits = ['Apple', 'Banana', 'Kiwi', 'Grape', 'Orange', 'Strawberry']*mult
vegetables = ['Asparagus', 'Broccoli', 'Carrot', 'Lettuce', 'Rutabaga', 'Spinach']*mult
animals = ['Dog', 'Cat', 'Bird', 'Fish', 'Lion', 'Mouse']*mult
xValues = np.random.normal(loc=80, scale=2, size=6*mult)
yValues = np.random.normal(loc=79, scale=2, size=6*mult)
data = {'Fruit': fruits,
'Vegetable': vegetables,
'Animal': animals,
'xValue': xValues,
'yValue': yValues,}
df = pd.DataFrame(data)
# shuffle the columns to break structure of repeating fruits, vegetables, animals
np.random.shuffle(df.Fruit)
np.random.shuffle(df.Vegetable)
np.random.shuffle(df.Animal)
df.head(30)
# filter sets
fruitsInclude = ['Apple', 'Banana', 'Grape']
vegetablesExclude = ['Asparagus', 'Broccoli']
# subset1: All rows and columns where:
# (fruit in fruitsInclude) AND (Vegetable not in vegetablesExlude)
# subset2: All rows and columns where:
# (fruit in fruitsInclude) AND [(Vegetable not in vegetablesExlude) OR (Animal == 'Dog')]
# subset3: All rows and specific columns where above logical conditions are true.
所有幫助和輸入歡迎和高度讚賞!
感謝, 蘭德爾
哇。正是我需要的。感謝您的快速和直接的答案。請注意,我拼寫的蔬菜包括錯誤...應該已經被蔬菜排除了(與c)。在上面的代碼中更正了它,所以應該複製並粘貼來測試。再次感謝。蘭德爾。 – user2537610