指定一個布爾過濾表達式到python腳本

我有一個包含學生信息的CSV（逗號分隔值）文件。列標題看起來像StudentId，StudentFirstName，StudentLastName，StudentZipCode，StudentHeight，StudentCommuteMethod等，後續行包含個別學生的信息。現在，我想編寫一個Python 2.5腳本，它將過濾條件作爲命令行參數，並返回與此過濾條件匹配的學生（行）集合。例如，過濾條件可以是類似下面（使用僞代碼格式）：指定一個布爾過濾表達式到python腳本

"StudentCommuteMethod = Bus AND StudentZipCode = 12345"

和Python腳本可以被調用：

MyPythonScript.py -filter "<above string>" -i input.csv

這應返回所有學生的名單（行）住在一個郵政編碼爲12345的地區，並乘坐巴士上下班。過濾器也可以是任意複雜的，並且可以包括任意數量的AND，OR操作符。

問題：

什麼是在其中該程序可以具有用戶指定的過濾條件（作爲命令行參數）的最佳格式。對於簡單的表達式，格式應該很簡單，並且必須足夠強大才能表達所有類型的條件。
- 我想到的格式是（1）SQL和（2）python語言本身。無論哪種情況，我都不知道如何讓python在運行時應用這些過濾器。也就是說，如何在命令行中輸入表達式並將其應用於行以獲得真或假？
我想有一個以可視方式表達過濾條件的UI。也許是允許每行輸入一個簡單的雙操作數條件的東西，以及一些使用AND和OR組合它們的方式。它應該能夠以上面（1）決定的格式發出一個過濾器表達式。有一些我可以重複使用的開源項目嗎？
如果您認爲有比傳遞命令行表達式+ UI更好的方法來解決此問題，請隨時提及它。最終，用戶（一位對編程瞭解不多的電氣工程師）應該能夠輕鬆地輸入過濾器表達式。

謝謝！

注意：我無法控制輸入或輸出格式（包括csv文件）。

來源

2011-09-09 LeoNeo

您似乎在使用python重新創建數據庫。爲什麼不使用數據庫？ – jozzas

安裝openoffice，導入並使用自動過濾功能 – 2011-09-09 05:06:03

非常一般的問題，一個簡單的網絡搜索提供瞭如下的答案： 1.看看如何解析命令行參數：http://docs.python.org/py3k/ library/argparse.html 2. UI庫有幾個python綁定可用：http://wiki.python.org/moin/GuiProgramming 3.「一位對編程瞭解不多的電氣工程師：什麼？編程與使用這個有關嗎？ – steabert

你肯定試圖在Python中重新實現SQL。我相信使用關係數據庫並運行SQL查詢會更好。

但是，關於問題1，您可以輕鬆地讓用戶在每行數據上輸入Python表達式和eval()。

這是一個工作示例，它使用exec將列值綁定到局部變量（一個討厭的黑客，我承認）。爲簡潔起見省略了CVS解析。

import optparse, sys 

# Assume your CSV data is read into a list of dictionaries 
sheet = [ 
    {'StudentId': 1, 'StudentFirstName': 'John', 'StudentLastName': 'Doe', 'StudentZipCode': '12345', 'StudentCommuteMethod': 'Bus'}, 
    {'StudentId': 2, 'StudentFirstName': 'Bob', 'StudentLastName': 'Chen', 'StudentZipCode': '12345', 'StudentCommuteMethod': 'Bus'}, 
    {'StudentId': 3, 'StudentFirstName': 'Jane', 'StudentLastName': 'Smith', 'StudentZipCode': '12345', 'StudentCommuteMethod': 'Train'}, 
    {'StudentId': 4, 'StudentFirstName': 'Dave', 'StudentLastName': 'Burns', 'StudentZipCode': '45467', 'StudentCommuteMethod': 'Bus'}, 
] 

# Options parsing 
parser = optparse.OptionParser() 
parser.add_option('--filter', type='string', dest='filter') 
options, args = parser.parse_args() 

# Filter option is required 
if options.filter is None: 
    print >> sys.stderr, 'error: no filter expression given' 
    sys.exit(1) 

# Process rows and build result set 
result = [] 
for row in sheet: 
    # Bind each column to a local variable (StudentId, StudentFirstName, etc.); 
    # this allows evaluating Python expressions on a row, for example: 
    # 'StudentCommuteMethod = "Bus" and StudentZipCode = "12345"' 
    for col, val in row.iteritems(): 
     exec '%s = %s' % (col, repr(val)) 

    # Apply filter to the row 
    if eval(options.filter): 
     result.append(row) 

# Print out result set 
for row in result: 
    print row

我測試使用以下的過濾器表達式：

./MyPythonScript.py --filter 'StudentCommuteMethod == "Bus" and StudentZipCode == "12345"' 
./MyPythonScript.py --filter 'StudentCommuteMethod == "Bus" or StudentZipCode == "12345"'

（殼當心運行在命令行程序時引用規則。）

來源

2011-09-09 07:19:57

感謝您的幫助！ – LeoNeo

這是上了Danilo的微小變化建議。您可避免通過傳遞當地人綁定的每一行變量exec字典eval，並csv.DictReader返回類型的字典很好地工作這樣的：

import csv, optparse 
infile = open('datafile.csv') 
reader = csv.DictReader(infile) 

parser = optparse.OptionParser() 
parser.add_option('--filter', type='string', dest='filter') 
options, args = parser.parse_args() 

for row in reader: 
    if eval(options.filter, row): 
     print row

這是假設輸入文件的第一行有列標題，任何你想在表達式中使用的標題必須是有效的Python標識符。

來源

2011-09-09 07:35:57 babbageclunk

感謝您提供關於'eval'的建議，我沒有嘗試傳遞適當的字典作爲當地人。 –

指定一個布爾過濾表達式到python腳本

回答

相關問題