Python：讀取和寫入CSV文件

我試圖從CSV文件（A）中讀取數據，提取數據並將其寫入不同的CSV文件（B）。在新的文件B中，我想要有兩列。列1列出文件A和列2中列1的名稱以列出文件A中列1的計數。例如，如果文件A看起來像這樣沒有'：'（它們排成兩列）：Python：讀取和寫入CSV文件

Animal: Gender 
Rabbit: Male 
Dog: Male 
Rabbit: Female 
Cat: Male 
Cat: Male 
Dog: Female 
Dog: Male 
Turtle: Male

我想在文件B輸出到這個樣子（實際上是沒有不同的列「：」再一次）：

Animal: Count 
Cat: 2 
Dog: 3 
Rabbit: 2 
Turtle: 1

這是第一次我做這樣的事，並這是我目前爲止的內容，但是我沒有將數據打印在文件B中，並且「計數」正確完成。有誰能幫我解決這個問題嗎？

import csv 
ReadData=csv.reader(open('C:\Users\..\FileA.csv','rb'), delimiter=',') 

def column(ReadData, i): 
    return [row[i] for row in ReadData] 

for line in ReadData: 
    WriteData=csv.writer(open('C:\Users\..\FileB.csv','wb'), 
         delimiter=' ', quotechar=':', quoting=csv.QUOTE_ALL) 
    print column(ReadData,1)

非常感謝您的幫助！

來源

2012-07-25 owl

此[鏈接]（http://stackoverflow.com/editing-help）介紹瞭如何編輯/後與標記 – Levon 2012-07-25 22:30:15

謝謝你的快速回復！我一直在檢查這個鏈接，但是我在填充空格時遇到了麻煩......我可能會丟失一些東西... – owl 2012-07-25 22:31:41

代碼只是（1）粘貼它，（2）突出顯示/選擇代碼塊，然後點擊（3 ）Control-K ..它會將它轉移到右邊（我認爲4列），並使其正確顯示爲代碼。 – Levon 2012-07-25 22:32:42

我會回答你問題的計數部分，也許你可以把它與你的問題的csv部分結合起來。

l = [ 
    ('Animal','Gender'), 
    ('Rabbit','Male'), 
    ('Dog','Male'), 
    ('Rabbit','Female'), 
    ('Cat','Male'), 
    ('Cat','Male'), 
    ('Dog','Female'), 
    ('Dog','Male'), 
    ('Turtle','Male') 
    ] 

d = {} 
for k,v in l: 
    if not k in d: 
     d[k] = 1 
    else: 
     d[k] += 1 

for k in d: 
    print "%s: %d" % (k,d[k])

我沒有篩選標題行，這段代碼的輸出是：

Turtle: 1 
Cat: 2 
Rabbit: 2 
Animal: 1 
Dog: 3

編輯：

您可以替換此：

if not k in d: 
    d[k] = 1 
else: 
    d[k] += 1

有了這個：

d[k] = d.setdefault(k,0) + 1

來源

2012-07-25 22:50:06 ChipJust

您應該使用[defaultdict]（http://docs.python.org/library/collections.html#defaultdict-examples）。 – BrtH 2012-07-25 22:51:48

我會推薦使用'collections.defaultdict（int）' - 失敗，至少利用'dict.setdefault' ... – 2012-07-25 22:52:00

@Jon，是的，我更新了帖子以顯示setdefault的使用。 – ChipJust 2012-07-25 23:01:02

要在Python> = 2.7中進行計數，請參閱this example for collections.Counter。使用collections.defaultdict，請參見here。

在您撥打csv.writer時，quotechar=':'可能是一個錯誤（這將使得WriteData.writerow(['Hello World', 12345]）發出「：Hello World：12345」，就好像冒號是引號。

另請注意，您的功能column(ReadData, i)消耗ReadData;隨後對ReadData的調用可能會返回一個空列表（未測試）。這對你的代碼來說不是問題（至少現在不是）。

這是沒有的CSV模塊的解決方案（畢竟，這些文件不看太像CSV）：

import collections 

inputfile = file("A") 

counts = collections.Counter() 

for line in inputfile: 
    animal = line.split(':')[0] 
    counts[animal] += 1 

for animal, count in counts.iteritems(): 
    print '%s: %s' % (animal, count)

來源

2012-07-25 22:51:44 tiwo

更好地寫成'animals =（line.split（'：'）[0] for input in inputfile）; counts = collections.Counter（animals）' – 2012-07-25 22:54:32

@Jon：是的，沒錯。 – tiwo 2012-07-25 23:00:10

非常感謝所有來源！我會試試看！ – owl 2012-07-25 23:03:49

看一看的itertools模塊和groupby功能。例如：

from itertools import groupby 

animals = [ 
    ('Rabbit', 'Male'), 
    ('Dog', 'Male'), 
    ('Rabbit', 'Female'), 
    ('Cat', 'Male'), 
    ('Cat', 'Male'), 
    ('Dog', 'Female'), 
    ('Dog', 'Male'), 
    ('Turtle', 'Male') 
    ] 

def get_group_key(animal_data): 
    return animal_data[0] 

animals = sorted(animals, key=get_group_key) 
animal_groups = groupby(animals, get_group_key) 

grouped_animals = [] 
for animal_type in animal_groups: 
    grouped_animals.append((animal_type[0], len(list(animal_type[1])))) 

print grouped_animals 

>>> [('Cat', 2), ('Dog', 3), ('Rabbit', 2), ('Turtle', 1)]

來源

2012-07-25 22:57:06

如果一組動物不完全連續 - 這會產生不正確的結果（請參閱上述結果中的「兔子」）。請注意'sum（1 for _ in iterable）'是獲取迭代器長度而不實現列表或其他序列的一種方法 – 2012-07-25 22:59:38

感謝您的幫助！我會一一嘗試所有的建議。 – owl 2012-07-25 23:03:03

@Jon是的，錯過了對數據的排序。關於不實現列表的好處。 – 2012-07-25 23:04:25

根據數據和複雜的大小...你可能要考慮使用pandas - 在http://pandas.pydata.org/信息和可用PyPI上。

但是請注意，這可能是過度殺戮，但我認爲我會把它扔到混合。

from pandas import DataFrame 

# rows is processed from string in the OP 
rows = [['Rabbit', ' Male'], ['Dog', ' Male'], ['Rabbit', ' Female'], ['Cat', ' Male'], ['Cat', ' Male'], ['Dog', ' Female'], ['Dog', ' Male'], ['Turtle', ' Male']] 

df = pandas.DataFrame(rows, columns=['animal', 'gender']) 

>>> df.groupby('animal').agg(len) 
     gender 
animal   
Cat   2 
Dog   3 
Rabbit  2 
Turtle  1 

>>> df.groupby(['animal', 'gender']).agg(len) 
animal gender 
Cat  Male  2 
Dog  Female 1 
     Male  2 
Rabbit Female 1 
     Male  1 
Turtle Male  1

來源

2012-07-25 23:10:44

謝謝分享！你知道是否有辦法克服你的代碼中的「行」打印組合？實際數據我有數百個有16列的「動物」... – owl 2012-07-25 23:18:54

@owl只需將結果賦值給一個變量...''pandas'基於'numpy'數組，所以如果你熟悉這個數組，已經有能力有效地進行數值計算......學習曲線的位，但值得... – 2012-07-25 23:24:32

謝謝你介紹這個！我正在嘗試所有的答案，並沒有達到你的，但我會嘗試！ – owl 2012-07-25 23:35:23

Python：讀取和寫入CSV文件

回答

相關問題