2012-08-10 17 views
0

代碼(以下轉載)讀入文件,執行操作並將原始文件的子集輸出到新文件中。我怎麼調整它一點點,而是輸出從初始文件到輸出文件的所有內容,但添加一個「標誌」列,值爲「1」,其中行是當前要輸出的行我們最感興趣的行子集)?其他行(當前僅在輸入文件中的行)將在新的「標誌」列中具有空白或「0」。輸出爲CSV:更改現有代碼以添加「標記」列而不是

這個問題對我來說足夠頻繁地發生,它會爲我節省很多時間只是爲了擁有這樣做的一般方式。

非常感謝任何幫助!

import csv 
inname = "aliases.csv" 
outname = "output.csv" 

def first_word(value): 
    return value.split(" ", 1)[0] 

with open(inname, "r", encoding = "utf-8") as infile: 
    with open(outname, "w", encoding = "utf-8") as outfile: 
     in_csv = csv.reader(infile) 
     out_csv = csv.writer(outfile) 

     column_names = next(in_csv) 
     out_csv.writerow(column_names) 

     id_index = column_names.index("id") 
     name_index = column_names.index("name") 

     try: 
      row_1 = next(in_csv) 
      written_row = False 

      for row_2 in in_csv: 
       if first_word(row_1[name_index]) == first_word(row_2[name_index]) and row_1[id_index] != row_2[id_index]: 
        if not written_row: 
         out_csv.writerow(row_1) 

        out_csv.writerow(row_2) 
        written_row = True 
       else: 
        written_row = False 

       row_1 = row_2 
     except StopIteration: 
      # No data rows! 
      pass 

回答

0

我在編寫CSV時總是使用DictReader,主要是因爲它更明確一點(這讓我更容易:))。以下是你可以做的一個高度風格化的版本。我所做的更改包括:

  • 使用csv.DictReader()csv.DictWriter(),而不是csv.readercsv.writer。這通過使用字典來表示行而不是列表而不同,這意味着行看起來像{'column_name': 'value', 'column_name_2': 'value2'}。這意味着每行都包含列標題數據,也可以像字典一樣對待。
  • 使用示例列名顯示讀/寫的工作方式。我做了有兩列的樣本CSV:書寫時namenumber,然後,我做了一個簡單的檢查,看看是否number> 2

考慮到這一點,這裏是例子:

import csv 

input_csv = 'aliases.csv' 
output_csv = 'output.csv' 

def first_word(value): 
    return value.split(' ', 1)[0] 

with open(input_csv, 'r') as infile: 
    # Specify the fieldnames in your aliases CSV 
    input_fields = ('name', 'number') 

    # Set up the DictReader, which will read the file into an iterable 
    # where each row is a {column_name: value} dictionary 
    reader = csv.DictReader(infile, fieldnames=input_fields) 

    # Now open the output file 
    with open(output_csv, 'w') as outfile: 
     # Define the new 'flag' field 
     output_fields = ('name', 'number', 'flag') 
     writer = csv.DictWriter(outfile, fieldnames=output_fields) 

     # Write the column names (this is a handy convention seen elsewhere on SO) 
     writer.writerow(dict((h, h) for h in output_fields)) 

     # Skip the first row (which is the column headers) and then store the 
     # first row dictionary 
     next(reader) 
     first_row = next(reader) 

     # Now begin your iteration through the input, writing all fields as they 
     # appear, but using some logic to write the 'flag' field 
     # This is where the dictionary comes into play - 'row' is actually a 
     # dictionary, so you can use dictionary syntax to assign to it 
     for next_row in reader: 
      # Set up the variables for your comparison 
      first_name = first_word(first_row['name']) 
      next_name = first_word(next_row['name']) 
      first_id = first_row['number'] 
      next_id = next_row['number'] 

      # Compare the current row to the previous row 
      if first_name == next_name and first_id != next_id: 
       # Here we are adding an element to our row dictionary - 'flag' 
       first_row['flag'] = 'Y' 
      # Now we write the entire first_row dictionary to the row 
      writer.writerow(first_row) 

      # Change the reference, just like you did 
      first_row = next_row 
+0

感謝您的文章。儘管如此,這對我並不適用。首先,我得到了一行語法錯誤:row ['flag'] ='Y'。正如我想的那樣,這不是一個有效的操作。對,我的意思是我們想爲「標誌」列添加一個'Y',但它看起來像使用一個列表,就好像它是一個字典或類似的東西。我不確定,但語法不起作用,它只有在我賦予賦值運算符成爲相同的運算符時纔有效,而且這沒有意義。另外,writer.writerow(row)語句不起作用。 – user1590499 2012-08-11 00:48:47

+0

這裏的另一個問題是,我不確定這是否符合我的要求。輸入文件中還有許多其他字段,我希望它們位於輸出文件中,但是我的邏輯比較是基於特定列中的某些值。它看起來在這裏,它讓我感到我們只會得到3列「名稱」「數字」和「標誌」。是對的嗎? – user1590499 2012-08-11 00:52:58

+0

作爲一個附錄,我認爲我想如何進行這種調整的邏輯是:在寫出之前添加一個新的列到行中(這是一個Python列表)。由於我們無法就地修改輸入文件,因此我們可以創建一個新文件,然後用新文件替換舊文件。它似乎應該很簡單,但我不知道該怎麼去做,因爲我最初並沒有創建任何列表...... – user1590499 2012-08-11 00:55:45

相關問題