2012-07-26 74 views
2

我試圖採取一個數據集,看起來像這樣:如何將行轉換爲重複的基於列的數據?

Source format of data

和改造記錄成這種格式:

Destination Format

產生的格式將有兩列,一個對於舊列名稱和一列值。如果有10,000行,則新格式應該有10,000組數據。

我對所有不同的方法,excel公式,sql(mysql)或直接的ruby代碼都適用於我。解決這個問題的最好方法是什麼?

+0

顯示格式通常是應用程序級別的關注,應該由你的應用程序代碼(紅寶石)來處理。 – mellamokb 2012-07-26 17:50:26

+0

將這些數據轉換爲新格式有什麼意義?它只是爲了人類的可讀性還是將它導入到另一個系統? – barancw 2012-07-26 17:51:02

+0

看看這個Railscast http://railscasts.com/episodes/362-exporting-csv-and-excel 你沒有使用Rails,但它仍然有幫助。 – LanguagesNamedAfterCofee 2012-07-26 18:05:01

回答

1

只是爲了好玩:

# Input file format is tab separated values 

# name search_term address code 
# Jim jim jim_address 123 
# Bob bob bob_address 124 
# Lisa lisa lisa_address 126 
# Mona mona mona_address 129 


infile = File.open("inputfile.tsv") 

headers = infile.readline.strip.split("\t") 
puts headers.inspect 
of = File.new("outputfile.tsv","w") 
infile.each_line do |line| 
    row = line.split("\t") 
    headers.each_with_index do |key, index| 
    of.puts "#{key}\t#{row[index]}" 
    end 
end 

of.close 



# A nicer way, on my machine it does 1.6M rows in about 17 sec 

File.open("inputfile.tsv") do | in_file | 
    headers = in_file.readline.strip.split("\t") 
    File.open("outputfile.tsv","w") do | out_file | 
    in_file.each_line do | line | 
     row = line.split("\t") 
     headers.each_with_index do | key, index | 
     out_file << key << "\t" << row[index] 
     end 
    end 
    end 
end 
+0

非常類似於我的解決方案... – holaSenor 2012-07-26 19:58:34

8

您可以在數據的左側添加一個ID列,並使用Reverse PivotTable方法。

  • 按下Alt + d + P與步驟訪問透視嚮導

    1. Multiple Consolidation Ranges 
    2a. I will create the page fields 
    2b. Range: eg. sheet1!A1:A4 
        How Many Page Fields: 0 
    3. Existing Worksheet: H1 
    
  • 在數據透視表:

    Uncheck Row and Column from the Field List 
    Double-Click the Grand Total as shown 
    

enter image description here

0
destination = File.open(dir, 'a') do |d| #choose the destination file and open it 

    source = File.open(dir , 'r+') do |s| #choose the source file and open it 
     headers = s.readline.strip.split("\t") #grab the first row of the source file to use as headers 
     s.each do |line| #interate over each line from the source 

     currentLine = line.strip.split("\t") #create an array from the current line 
      count = 0 #track the count of each array index 
     currentLine.each do |c| #iterate over each cell of the currentline 
       finalNewLine = '"' + "#{headers[count]}" + '"' + "\t" + '"' + "#{currentLine[count]}" + '"' + "\n" #build each new line as one big string 
      d.write(finalNewLine) #write final line to the destination file. 
      count += 1 #increment the count to work on the next cell in the line 
     end 

     end 
    end 

end 
相關問題