在CSV文件的特定列中計數重複值並將該值返回到另一列（python2）

我正在嘗試計算CSV文件列中的重複值並將值返回給python中的另一個CSV列。在CSV文件的特定列中計數重複值並將該值返回到另一列（python2）

例如，我的CSV文件：

KeyID GeneralID 
145258 KL456 
145259 BG486 
145260 HJ789 
145261 KL456

我想實現的是計算有多少數據具有相同的GeneralID並將其插入新的CSV列。例如，

KeyID Total_GeneralID 
145258 2 
145259 1 
145260 1 
145261 2

我試圖使用拆分方法拆分每列，但它不能很好地工作。

我的代碼：

case_id_list_data = [] 

with open(file_path_1, "rU") as g: 
    for line in g: 
     case_id_list_data.append(line.split('\t')) 
     #print case_id_list_data[0][0] #the result is dissatisfying 
     #I'm stuck here..

來源

2017-04-25 yunaranyancat

而如果你是不利的大熊貓，並希望留在標準庫：

代碼：

import csv 
from collections import Counter 
with open('file1', 'rU') as f: 
    reader = csv.reader(f, delimiter='\t') 
    header = next(reader) 
    lines = [line for line in reader] 
    counts = Counter([l[1] for l in lines]) 

new_lines = [l + [str(counts[l[1]])] for l in lines] 
with open('file2', 'wb') as f: 
    writer = csv.writer(f, delimiter='\t') 
    writer.writerow(header + ['Total_GeneralID']) 
    writer.writerows(new_lines)

結果：

KeyID GeneralID Total_GeneralID 
145258 KL456 2 
145259 BG486 1 
145260 HJ789 1 
145261 KL456 2

來源

2017-04-25 05:42:16

你用什麼python版本導入集合庫？我正在使用python v 2.6.6，並且出現錯誤 'from collections import Counter' 'ImportError：無法導入名稱計數器' – yunaranyancat

計數器爲2.7+，但您可以在此獲取源代碼：http：// code .activestate.com/recipes/576611-counter-class/ –

import pandas as pd 
#read your csv to a dataframe 
df = pd.read_csv('file_path_1') 
#generate the Total_GeneralID by counting the values in the GeneralID column and extract the occurrance for the current row. 
df['Total_GeneralID'] = df.GeneralID.apply(lambda x: df.GeneralID.value_counts()[x]) 
df = df[['KeyID','Total_GeneralID']] 
Out[442]: 
    KeyID Total_GeneralID 
0 145258    2 
1 145259    1 
2 145260    1 
3 145261    2

來源

2017-04-25 05:29:36 Allen

您可以使用庫：

第一read_csv
通過value_counts，rename得到GeneralID列值的計算由輸出列
join原始DataFrame

import pandas as pd 

df = pd.read_csv('file') 
s = df['GeneralID'].value_counts().rename('Total_GeneralID') 
df = df.join(s, on='GeneralID') 
print (df) 
    KeyID GeneralID Total_GeneralID 
0 145258  KL456    2 
1 145259  BG486    1 
2 145260  HJ789    1 
3 145261  KL456    2

來源

2017-04-25 05:37:11 jezrael

你有三個步驟來劃分任務： 1.閱讀CSV文件 2.生成新列的值 3.添加值迴文件導入CSV 進口的FileInput 進口SYS

# 1. Read CSV file 
# This is opening CSV and reading value from it. 
with open("dev.csv") as filein: 
    reader = csv.reader(filein, skipinitialspace = True) 
    xs, ys = zip(*reader) 

result=["Total_GeneralID"] 

# 2. Generate new column's value 
# This loop is for counting the "GeneralID" element. 
for i in range(1,len(ys),1): 
    result.append(ys.count(ys[i])) 

# 3. Add value to the file back 
# This loop is for writing new column 
for ind,line in enumerate(fileinput.input("dev.csv",inplace=True)): 
    sys.stdout.write("{} {}, {}\n".format("",line.rstrip(),result[ind]))

我沒有使用臨時文件或任何高級模塊，如熊貓或任何東西。

來源

2017-04-25 06:39:38

你能爲csv.DictReader顯示另一種方法嗎？或者它是一樣的東西嗎？'xs，ys = zip（* reader）'是做什麼的？ – yunaranyancat

Zip（）返回一個元組列表，其中每個元組包含來自每個參數序列的第i個元素。 –

使用csv.reader而不是split（）方法。它更容易。

謝謝

來源

2017-04-25 06:48:28

在CSV文件的特定列中計數重複值並將該值返回到另一列（python2）

回答

相關問題