我試圖從以下格式的大型CSV文件中提取數據,假設'x'是文本或整數形式的數據。每個分組都有一個唯一的ID,但每個分組或顏色並不總是具有相同的行數。數據通過逗號與顏色分開。Python來提取和排序文件中的數據
id, x
red, x
green, x
blue, x
black, x
id, x
yellow, x
green,
blue, x
black, x
id, x
red, x
green, x
blue, x
black, x
id, x
red, x
green, x
blue, x
id, x
red, x
green, x
blue, x
black, x
我想以列格式重新排列數據。該ID應該是第一列,並且用逗號分隔所有數據。我的目標是讓它讀取行中的第一個單詞並將其放在適當的列中。
line 0 - ID - red - green - blue - yellow - black
line 1 - x, x, x, , x,
line 2 - , x, x, x, x,
line 3 - x, x, x, , x,
line 4 - x, x, x, , ,
line 5 - x, x, x, , x,
這是我努力...
readfile = open("db-short.txt", "r")
datafilelines = readfile.readlines()
writefile = open("sample.csv", "w")
temp_data_list = ["",]*7
td_index = 0
for line_with_return in datafilelines:
line = line_with_return.replace('\n','')
if not line == '':
if not (line.startswith("ID") or
line.startswith("RED") or
line.startswith("GREEN") or
line.startswith("BLUE") or
line.startswith("YELLOW") or
line.startswith("BLACK")):
temp_data_list[td_index] = line
td_index += 1
temp_data_list[6] = line
if (line.startswith("BLACK") or line.startswith("BLACK")):
temp_data_list[5] = line
if (line.startswith("YELLOW") or line.startswith("YELLOW")):
temp_data_list[4] = line
if (line.startswith("BLUE") or line.startswith("BLUE")):
temp_data_list[3] = line
if (line.startswith("GREEN") or line.startswith("GREEN")):
temp_data_list[2] = line
if (line.startswith("RED") or line.startswith("RED")):
temp_data_list[1] = line
if (line.startswith("ID") or line.find("ID") > 0):
temp_data_list[0] = line
if line == '':
temp_data_str = ""
for temp_data in temp_data_list:
temp_data_str += temp_data + ","
temp_data_str = temp_data_str[0:-1] + "\n"
writefile.write(temp_data_str)
temp_data_list = ["",]*7
td_index = 0
if temp_data_list[0]:
temp_data_str = ""
for temp_data in temp_data_list:
temp_data_str += temp_data + ","
temp_data_str = temp_data_str[0:-1] + "\n"
writefile.write(temp_data_str)
readfile.close()
writefile.close()
你嘗試過這麼遠嗎?標準庫'csv'模塊可能是一個很好的開始。 –
我知道你說你想要一個Python解決方案,但你有沒有考慮R?它是專爲這些類型的任務 – Stedy
,我會confesss我新的編程,我試圖用這個... http://ubuntuforums.org/showpost.php?p=6159649&postcount=4 但我一直得到這個錯誤。 IndexError:列表分配索引超出範圍 現在我才知道這是因爲數據是如何格式化 我會看看在r –