這是一個相當全面的方法;它有點長,但不像看起來那麼複雜!
我假定Python 3.x,但它應該在Python 2.x中工作,但幾乎沒有變化。我廣泛使用生成器來傳輸數據,而不是將其保存在內存中。
首先:我們將爲每個字段定義預期的數據類型。某些字段不符合內置Python的數據類型,所以我定義這些字段的一些自定義數據類型開始:
import time
class Date:
def __init__(self, s):
"""
Parse a date provided as "yy/mm/dd"
"""
if s.strip():
self.date = time.strptime(s, "%y/%m/%d")
else:
self.date = time.gmtime(0.)
def __str__(self):
"""
Return a date as "yy/mm/dd"
"""
return time.strftime("%y/%m/%d", self.date)
def Int(s):
"""
Parse a string to integer ("" => 0)
"""
if s.strip():
return int(s)
else:
return 0
class Year:
def __init__(self, s):
"""
Parse a year provided as "yyyy"
"""
if s.strip():
self.date = time.strptime(s, "%Y")
else:
self.date = time.gmtime(0.)
def __str__(self):
"""
Return a year as "yyyy"
"""
return time.strftime("%Y", self.date)
現在,我們建立了一個表,定義每個字段應該是什麼類型:
# Expected data-type of each field:
# data_types[section][field] = type
data_types = {
"CU": {
"Customer_ID": Int,
"Last_Name": str,
"First_Name": str,
"Street_Address": str,
"City": str
},
"VE": {
"License_Plate#": str,
"Make": str,
"Model": str,
"Year": Year,
"Owner_ID": Int
},
"SE": {
"Vehicle_ID": str,
"Service_Code": Int,
"Date_Scheduled": Date
}
}
我們解析輸入文件;這是迄今爲止最複雜的一點!這是作爲發電機的功能實現的有限狀態機,同時產生一個部分:
# Customized error-handling
class TransactionError (BaseException): pass
class EntryNotInSectionError (TransactionError): pass
class MalformedLineError (TransactionError): pass
class SectionNotTerminatedError(TransactionError): pass
class UnknownFieldError (TransactionError): pass
class UnknownSectionError (TransactionError): pass
def read_transactions(fname):
"""
Read a transaction file
Return a series of ("section", {"key": "value"})
"""
section, accum = None, {}
with open(fname) as inf:
for line_no, line in enumerate(inf, 1):
line = line.strip()
if not line:
# blank line - skip it
pass
elif line == "//":
# end of section - return any accumulated data
if accum:
yield (section, accum)
section, accum = None, {}
elif line[:3] == "IN ":
# start of section
if accum:
raise SectionNotTerminatedError(
"Line {}: Preceding {} section was not terminated"
.format(line_no, section)
)
else:
section = line[3:].strip()
if section not in data_types:
raise UnknownSectionError(
"Line {}: Unknown section type {}"
.format(line_no, section)
)
else:
# data entry: "key=value"
if section is None:
raise EntryNotInSectionError(
"Line {}: '{}' should be in a section"
.format(line_no, line)
)
pair = line.split("=")
if len(pair) != 2:
raise MalformedLineError(
"Line {}: '{}' could not be parsed as a key/value pair"
.format(line_no, line)
)
key,val = pair
if key not in data_types[section]:
raise UnknownFieldError(
"Line {}: unrecognized field name {} in section {}"
.format(line_no, key, section)
)
accum[key] = val.strip()
# end of file - nothing should be left over
if accum:
raise SectionNotTerminatedError(
"End of file: Preceding {} section was not terminated"
.format(line_no, section)
)
現在,該文件被讀取,剩下的就是更容易。我們做類型轉換上的每個字段,用我們上面定義的查找表:
def format_field(section, key, value):
"""
Cast a field value to the appropriate data type
"""
return data_types[section][key](value)
def format_section(section, accum):
"""
Cast all values in a section to the appropriate data types
"""
return (section, {key:format_field(section, key, value) for key,value in accum.items()})
和結果寫回文件:
def write_transactions(fname, transactions):
with open(fname, "w") as outf:
for section,accum in transactions:
# start section
outf.write("IN {}\n".format(section))
# write key/value pairs in order by key
keys = sorted(accum.keys())
for key in keys:
outf.write(" {}={}\n".format(key, accum[key]))
# end section
outf.write("//\n")
所有機器到位;我們只需要將它稱爲:
def main():
INPUT = "transaction.txt"
OUTPUT = "customer.diff"
transactions = read_transactions(INPUT)
cleaned_transactions = (format_section(section, accum) for section,accum in transactions)
write_transactions(OUTPUT, cleaned_transactions)
if __name__=="__main__":
main()
希望幫助!
爲什麼不只是'if field_names [i]'? 'field_names [i]'不會評估爲「真」。 – benjamin
對不起,只有''Home_Phone =':'Home_Phone = 0','Business_Phone =':'Business_Phone = 0''也能夠改變'Customer_ID'。 –
@benjamin我已經嘗試了兩種,但都沒有工作:( – Amon