查找多個數據框中是否存在列值

我有4個excel文件 - 'a1.xlsx'，'a2.xlsx'，'a3.xlsx'，'a4.xlsx' 這些文件的格式相同查找多個數據框中是否存在列值

爲如a1.xlsx樣子：

id code name 
1  100 abc 
2  200 zxc 
... ... ...

我要讀的熊貓數據幀此文件，並檢查code列的相同值是否存在多個Excel文件或沒有。

這樣的事情。

如果code=100存在於'a1.xlsx','a3.xlsx'和code=200只存在於'a1.xlsx'

最終數據幀應該是這樣的：

code filename 
100 a1.xlsx,a3.xlsx 
200 a1.xlsx 
... .... 
and so on

我有一個目錄下的所有文件，並試圖通過循環

迭代他們

import pandas as pd 
import os 
x = next(os.walk('path/to/files/'))[2] #list all files in directory 
os.chdir('path/to/files/') 

for i in range (0,len(x)): 
    df = pd.read_excel(x[i])

如何繼續？任何線索？

來源

2017-10-06 Shubham

用途：

import glob 

#get all filenames 
files = glob.glob('path/to/files/*.xlsx') 
#list comprehension with assign new column for filenames 
dfs = [pd.read_excel(fp).assign(filename=os.path.basename(fp).split('.')[0]) for fp in files] 
#one big df from list of dfs 
df = pd.concat(dfs, ignore_index=True) 
#join all same codes 
df1 = df.groupby('code')['filename'].apply(', '.join).reset_index()

來源

2017-10-06 05:58:12 jezrael

工作！等待9分鐘再接受：p – Shubham

查找多個數據框中是否存在列值

回答

相關問題