在數據框中查找參數字符串中的數據

我有一個很大的csv文件（可能〜2000條目），其中包含由幾個參數（其餘列）描述的文件列（列0）第一列是隻是爲了可讀性，它沒有明確包含在CSV文件）：在數據框中查找參數字符串中的數據

(i) Filename; File extension; Month created; Year created; Author; Notes; 
0 file1; txt; 07; 2015; AB; NaN; 
1 file2; txt; 07; 2015; AB; NaN; 
2 file2b; txt; 07; 2015; AB; some notes; 
3 file3; txt; 06; 2013; CD; some text; 
4 file4; txt; 06; 2012; EF; other text; 
5 file5; txt; 05; 2011; EF; NaN; 
...

我讀過與pandas.read_csv（）到數據幀（稱爲files_df）整個文件。我現在想要做的是檢索所有符合某些標準的文件。例如。獲取作者AB於2015年7月創建的所有文件，並且沒有任何註釋應找到與第0 + 1行相匹配的文件，但不包含所有其他文件。

我已經可以用

files_df.loc[(files_df['Month created'] == '07') & 
      (files_df['Year created'] == '2015') & 
      (files_df['Author'] == 'AB') & 
      (files_df['Notes'].isnull())]

檢索文件，但我怎麼能填在Python字符串automaticall？我已經存儲了一組組合用於在dictionary中使用鍵和值進行過濾。但我想不出一種自動填充字符串的方法。任何人都可以指向正確的方向嗎？

（我沒有太多的工作與Python，字典是浮現在腦海中只是第一種類型，我沒有用他們，如果不同類型更適合這個。）

[編輯澄清：]

一個典型的輸入如下所示：

parameters = {'Month created': {'07'}, 
       'Year created': {'2015'}, 
       'Author': {'AB'}, 
       'Notes': {}}

我希望做的是，寫的是這樣的：

def read_files(parameters): 
    files = files_df.loc[ 
      # how to fill parameter keys & values here??? 
      ] 
    return files

來源

2015-10-16 fukiburi

你想「填寫」什麼，你想填補什麼？ –

感謝提問，看來我的問題寫得不太清楚。我編輯了這篇文章。 – fukiburi

經過一段時間的嘗試，我找到了這個解決方案。它看起來像一個打擊黑客，但是...

def read_files(files_df, parameters): 
    idx = [] 
    for key in parameters.keys(): 
     if len(idx) == 0: 
      idx = (files_df[key] == parameters[key]) 
     else: 
      idx = idx & (files_df[key] == parameters[key]) 
    idx = idx & files_df['Notes'].isnull() 
    files = files_df.loc[idx] 

    return files

來源

2015-10-21 08:34:19 fukiburi

在數據框中查找參數字符串中的數據

回答

相關問題