2014-03-19 68 views
1

好了,所以我有一個交易文件糾正錯誤:在一個文件,並將其寫入到一個新的文件

IN CU 
    Customer_ID= 
    Last_Name=Johnston 
    First_Name=Karen 
    Street_Address=291 Stone Cr 
    City=Toronto 
// 
IN VE 
    License_Plate#=LSR976 
    Make=Cadillac 
    Model=Seville 
    Year=1996 
    Owner_ID=779 
// 
IN SE 
    Vehicle_ID=LSR976 
    Service_Code=461 
    Date_Scheduled=00/12/19 

IN意味着插入和CU(指客戶)是指我們正在寫什麼文件也一樣,這個案例是customer.diff。我遇到的問題是我需要檢查每一行,並檢查每個字段的值(例如Customer_ID)。你看到Customer_ID是如何留空的?我需要用值0替換任何數字空白字段,所以在這種情況下例如Customer_ID=0。這裏是我到目前爲止,但沒有正在發生變化:

def insertion(): 
    field_names = {'Customer_ID=': 'Customer_ID=0', 
'Home_Phone=':'Home_Phone=0','Business_Phone=': 'Business_Phone=0'} 

    with open('xactions.two.txt', 'r') as from_file: 
     search_lines = from_file.readlines() 


    if search_lines[3:5] == 'CU': 
     for i in search_lines: 
      if field_names[i] == True: 
       with open('customer.diff', 'w') as to_file: 
        to_file.write(field_names[i]) 

感謝

+0

爲什麼不只是'if field_names [i]'? 'field_names [i]'不會評估爲「真」。 – benjamin

+0

對不起,只有''Home_Phone =':'Home_Phone = 0','Business_Phone =':'Business_Phone = 0''也能夠改變'Customer_ID'。 –

+0

@benjamin我已經嘗試了兩種,但都沒有工作:( – Amon

回答

2

爲什麼不嘗試一些簡單一些?我沒有測試過這個代碼。

def insertion(): 
    field_names = {'Customer_ID=': 'Customer_ID=0', 
'Home_Phone=':'Home_Phone=0','Business_Phone=': 'Business_Phone=0'} 

with open('xactions.two.txt', 'r') as from_file: 
    with open('customer.diff', 'w') as to_file: 
     for line in from_file: 
      line = line.rstrip("\n") 
      found = False 
      for field in field_names.keys(): 
       if field in line: 
        to_file.write(line + "0") 
        found = True 
      if not found: 
       to_file.write(line) 
      to_file.write("\n") 
+0

我得到一個錯誤,告訴我「字典對象沒有屬性iter_keys' – Amon

+0

的確應該是iterkeys,而不是iter_keys。謝謝@Matthew – benjamin

+0

它仍然給我相同的屬性錯誤 – Amon

1

這是一個相當全面的方法;它有點長,但不像看起來那麼複雜!

我假定Python 3.x,但它應該在Python 2.x中工作,但幾乎沒有變化。我廣泛使用生成器來傳輸數據,而不是將其保存在內存中。

首先:我們將爲每個字段定義預期的數據類型。某些字段不符合內置Python的數據類型,所以我定義這些字段的一些自定義數據類型開始:

import time 

class Date: 
    def __init__(self, s): 
     """ 
     Parse a date provided as "yy/mm/dd" 
     """ 
     if s.strip(): 
      self.date = time.strptime(s, "%y/%m/%d") 
     else: 
      self.date = time.gmtime(0.) 

    def __str__(self): 
     """ 
     Return a date as "yy/mm/dd" 
     """ 
     return time.strftime("%y/%m/%d", self.date) 

def Int(s): 
    """ 
    Parse a string to integer ("" => 0) 
    """ 
    if s.strip(): 
     return int(s) 
    else: 
     return 0 

class Year: 
    def __init__(self, s): 
     """ 
     Parse a year provided as "yyyy" 
     """ 
     if s.strip(): 
      self.date = time.strptime(s, "%Y") 
     else: 
      self.date = time.gmtime(0.) 

    def __str__(self): 
     """ 
     Return a year as "yyyy" 
     """ 
     return time.strftime("%Y", self.date) 

現在,我們建立了一個表,定義每個字段應該是什麼類型:

# Expected data-type of each field: 
# data_types[section][field] = type 
data_types = { 
    "CU": { 
     "Customer_ID": Int, 
     "Last_Name":  str, 
     "First_Name":  str, 
     "Street_Address": str, 
     "City":   str 
    }, 
    "VE": { 
     "License_Plate#": str, 
     "Make":   str, 
     "Model":   str, 
     "Year":   Year, 
     "Owner_ID":  Int 
    }, 
    "SE": { 
     "Vehicle_ID":  str, 
     "Service_Code": Int, 
     "Date_Scheduled": Date 
    } 
} 

我們解析輸入文件;這是迄今爲止最複雜的一點!這是作爲發電機的功能實現的有限狀態機,同時產生一個部分:

# Customized error-handling 
class TransactionError   (BaseException): pass 
class EntryNotInSectionError (TransactionError): pass 
class MalformedLineError  (TransactionError): pass 
class SectionNotTerminatedError(TransactionError): pass 
class UnknownFieldError  (TransactionError): pass 
class UnknownSectionError  (TransactionError): pass 

def read_transactions(fname): 
    """ 
    Read a transaction file 
    Return a series of ("section", {"key": "value"}) 
    """ 
    section, accum = None, {} 
    with open(fname) as inf: 
     for line_no, line in enumerate(inf, 1): 
      line = line.strip() 

      if not line: 
       # blank line - skip it 
       pass 
      elif line == "//": 
       # end of section - return any accumulated data 
       if accum: 
        yield (section, accum) 
       section, accum = None, {} 
      elif line[:3] == "IN ": 
       # start of section 
       if accum: 
        raise SectionNotTerminatedError(
         "Line {}: Preceding {} section was not terminated" 
         .format(line_no, section) 
        ) 
       else: 
        section = line[3:].strip() 
        if section not in data_types: 
         raise UnknownSectionError(
          "Line {}: Unknown section type {}" 
          .format(line_no, section) 
         ) 
      else: 
       # data entry: "key=value" 
       if section is None: 
        raise EntryNotInSectionError(
         "Line {}: '{}' should be in a section" 
         .format(line_no, line) 
        ) 
       pair = line.split("=") 
       if len(pair) != 2: 
        raise MalformedLineError(
         "Line {}: '{}' could not be parsed as a key/value pair" 
         .format(line_no, line) 
        ) 
       key,val = pair 
       if key not in data_types[section]: 
        raise UnknownFieldError(
         "Line {}: unrecognized field name {} in section {}" 
         .format(line_no, key, section) 
        ) 
       accum[key] = val.strip() 

     # end of file - nothing should be left over 
     if accum: 
      raise SectionNotTerminatedError(
       "End of file: Preceding {} section was not terminated" 
       .format(line_no, section) 
      ) 

現在,該文件被讀取,剩下的就是更容易。我們做類型轉換上的每個字段,用我們上面定義的查找表:

def format_field(section, key, value): 
    """ 
    Cast a field value to the appropriate data type 
    """ 
    return data_types[section][key](value) 

def format_section(section, accum): 
    """ 
    Cast all values in a section to the appropriate data types 
    """ 
    return (section, {key:format_field(section, key, value) for key,value in accum.items()}) 

和結果寫回文件:

def write_transactions(fname, transactions): 
    with open(fname, "w") as outf: 
     for section,accum in transactions: 
      # start section 
      outf.write("IN {}\n".format(section)) 
      # write key/value pairs in order by key 
      keys = sorted(accum.keys()) 
      for key in keys: 
       outf.write(" {}={}\n".format(key, accum[key])) 
      # end section 
      outf.write("//\n") 

所有機器到位;我們只需要將它稱爲:

def main(): 
    INPUT = "transaction.txt" 
    OUTPUT = "customer.diff" 
    transactions = read_transactions(INPUT) 
    cleaned_transactions = (format_section(section, accum) for section,accum in transactions) 
    write_transactions(OUTPUT, cleaned_transactions) 

if __name__=="__main__": 
    main() 

希望幫助!

相關問題