我是相當新的編程,並希望這一計劃,file1.csv
和file2.csv
如何將兩個CSV文件之間的「共同」列
輸入之間傳輸的公用列代碼:
file1.csv
外觀像這樣:
ID,Nickname,Gender,SubjectPrefix,SubjectFirstName,Whatever1A,Whaterver2A,SubjectLastName
1,J.,M,Dr.,Jason,,,Allan
2,B.,M,Mr.,Brian,,,Welch
file2.csv
看起來是這樣的:
nickname,gender,city,id,prefix_name,first_name,Whatever1B,last_name,Whatever2B,Whatever3B,Whatever4B
問題:
如何比較的file1.csv
和file1.csv
頭識別,然後將它們之間傳輸的「共同」欄目。 「共同」列有類似的命名約定的,(即ID
和id
,Nickname
和nickname
),或不一定具有相同的命名慣例,但存儲相同的數據的,(即SubjectPrefix
和prefix_name
,SubjectFirstName
和first_name
)。
輸出:
輸出應該是這樣的。
注:轉移列
"id"
,"nickname"
和"gender"
與file1.csv
和file2.csv
標題之間相似的命名的人。並且列"prefix_name"
和"first_name"
分別對應於"SubjectPrefix"
和"SubjectFirstName"
。id,nickname,gender,prefix_name,first_name,last_name 1,J.,M,Dr.,Jason,Allan 2,B.,M,Mr.,Brian,Welch
我試過這段代碼:
import csv
import collections
csv_file1 = "file1.csv"
csv_file2 = "file2.csv"
data1 = list(csv.reader(file(csv_file1,'r')))
data2 = list(csv.reader(file(csv_file2,'r')))
file1_header = data1[0][:] #get the header from file1
file2_header = data2[0][:] #get the header from file2
lowered_file1_header = [item.lower() for item in file1_header] #lowercase file1 header
lowered_file2_header = [item.lower() for item in file2_header] #lowercase file2 header anyways
col_index_dict = {}
for column in lowered_file1_header:
if column == "subjectprefix": # identify "subjectprefix" column in file1.csv
col_index_dict[column] = lowered_file1_header.index(column)
elif column == "subjectfirstname": # identify "subjectfirstname" column in file1.csv
col_index_dict[column] = lowered_file1_header.index(column)
elif column in file2_header: # identify the columns with same naming
col_index_dict[column] = lowered_file1_header.index(column)
else:
col_index_dict[column] = -1 # mark the not matching columns
# Build header
output = [col_index_dict.keys()]
is_header = True
for row in data1:
if is_header is False:
rowData = []
for column in col_index_dict:
column_index = col_index_dict[column]
if column_index != -1:
rowData.append(row[column_index])
else:
rowData.append('')
output.append(rowData)
else:
is_header = False
print(output)
任何想法如何如何解決這個問題?
這裏看看這個想法...... set(list(x)is not need ...瞭解用「isin」過濾熊貓的過程。https://people.duke.edu/~ccc14/sta-663/ IntroductionToPythonSolutions.html – Merlin