如何根據另一個查找選擇數據框中的列？

我正在處理大量預測變量的數據集，並希望通過使用控制文件輕鬆測試不同的複合變量分組。對於初學者來說，控制文件會指示是否包含變量。這裏有一個例子：如何根據另一個查找選擇數據框中的列？

control = pd.DataFrame({'Variable': ['Var1','Var2','Var3'], 
        'Include': [1,0,1]}) 

control 
Out[48]: 
    Include Variable 
0  1  Var1 
1  0  Var2 
2  1  Var3 

data = pd.DataFrame({'Sample':['a','b','c'], 
        'Var1': [1,0,0], 
        'Var2': [0,1,0], 
        'Var3': [0,0,1]}) 

data 
Out[50]: 
    Sample Var1 Var2 Var3 
0  a  1  0  0 
1  b  0  1  0 
2  c  0  0  1

所以處理後的結果應該是一個新的數據幀，它看起來像數據，但下降的VAR2柱：

data2 
Out[51]: 
    Sample Var1 Var3 
0  a  1  0 
1  b  0  0 
2  c  0  1

我能得到這個通過使用選擇性刪除列的工作.itterows（）：

data2 = data.copy() 
for index, row in control.iterrows(): 
    if row['Include'] != 1: 
     z = (row['Variable']) 
     data2.drop(z, axis=1,inplace="True")

這工作，但似乎應該有辦法做到這一點對整個數據框中一次。例如：

data2 = data[control['Include'] == 1]

但是，這會根據「包含」值而不是列過濾出行。

任何建議表示讚賞。

來源

2016-12-04 user1355179

選擇從control框架必要的標頭，並使用從data智能選擇：

headers = control[control['Include']==1]['Variable'] 
all_headers = ['Sample'] + list(headers) 
data[all_headers] 
# Sample Var1 Var3 
#0  a  1  0 
#1  b  0  0 
#2  c  0  1

一個側面說明：如果可能的話可以考慮使用布爾True和False，而不是在Include列0和1。轉換頭到列表第一的伎倆 -

來源

2016-12-04 06:05:16 DyZ

謝謝@DYZ是一個非常快速的解決方案。在這種情況下測試布爾值的正確方法是什麼？仍然== 1？ – user1355179

這是這樣的：'headers = control [control ['Include']] ['Variable']' – DyZ

這應該使用numpy的重建

# get data columns values which is a numpy array 
dcol = data.columns.values 
# find the positions where control Include are non-zero 
non0 = np.nonzero(control.Include.values) 
# slice control.Variable to get names of Variables to include 
icld = control.Variable.values[non0] 
# search the column names of data for the included Variables 
# and the Sample column to get the positions to slice 
srch = dcol.searchsorted(np.append('Sample', icld)) 
# reconstruct the dataframe using the srch slice we've created 
pd.DataFrame(data.values[:, srch], data.index, dcol[srch])

來源

2016-12-04 07:02:36 piRSquared

使用原始numpy例程的方法更加冗長，只是混淆了初學者;堅持更多的用戶友好的熊貓索引例程（特別是因爲標籤不要求一個numpy soln） – Jeff

如何根據另一個查找選擇數據框中的列？

回答

相關問題