2011-12-06 16 views
2

我有一個csv文件,我試圖找到列2中列中的所有uniq值,其中列1具有相同的值並將其合併到一個新的csv文件中。我知道,這聽起來令人困惑的方式所以這裏有一個例子:更快的CSV +試圖找到獨特的項目

原始文件foo.csv的樣本:

"Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity" 
"Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height","Platform Capacity" 
"Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height" 
"Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height" 
"Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Extension" 
"Scissor Lifts","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity" 

了理想的結果bar.csv:

"Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity","Up & Over Height","Platform Capacity",,, 
"Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height" 
"Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity" 

每個行的長度是不同的,它是一個非常大的文件(超過5千行),我完全不知道如何執行匹配/字符串操作。是的,其中一些線條有尾隨逗號,其中有「空單元格」。我一直在使用更快的CSV,所以如果有辦法做到這一點,那就太好了。

指針?最好是不會讓我的mbp陷入戛然而止的狀態?

+0

因此,a)第一列可以被視爲一個關鍵字,以及b)所有後續列可以被視爲列表中的值,最後您希望這個列表包含唯一值......? bar.csv中的最後一行重複「外形尺寸」和「平臺擴展」。重複的值是否可以? – buruzaemon

+0

我的不好,維度和平臺擴展不應重複。我想用更快的CSV讀入一個文件foo.csv並吐出另一個bar.csv。謝謝。 – MarkL

回答

1

假設你可以把它與更快的CSV二維數組:

a = [ 
    ["Boom Lifts","Model Number","Manufacturer","Platform Height","Horizontal Outreach","Lift Capacity"] 
    ["Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height","Platform Capacity"] 
    ["Boom Lifts","Model Number","Platform Height","Horizontal Outreach","Up & Over Height"] 
    ["Pusharound Lifts","Model Number","Manufacturer","Platform Height","Stowed Height"] 
    ["Scissor Lifts","Model Number","Manufacturer","Platform Height","Stowed Height","Overall Dimensions","Platform Extension"] 
    ["Scissor Lifts","Overall Dimensions","Platform Size","Platform Extension","Lift Capacity"] 
] 

a.group_by {|e| e[0]}.map {|e| e.flatten.uniq} 

讓你:

[ 
    ["Boom Lifts", "Model Number", "Manufacturer", "Platform Height", "Horizontal Outreach", "Lift Capacity", "Up & Over Height", "Platform Capacity"] 
    ["Pusharound Lifts", "Model Number", "Manufacturer", "Platform Height", "Stowed Height"] 
    ["Scissor Lifts", "Model Number", "Manufacturer", "Platform Height", "Stowed Height", "Overall Dimensions", "Platform Extension", "Platform Size", "Lift Capacity"] 
] 

不會是瞬間的,但不應該把你的MBP下來。