2013-05-28 49 views
0

我有以下的製表符分隔文本文件中的記錄由特定列組項目:如何從製表符分隔的文件

sku title Product Type       
19686940 This is test Title1 toys       
19686941 This is test Title2 toys       
19686942 This is test Title3 toys       
20519300 This is test Title1 toys2      
20519301 This is test Title2 toys2 
20580987 This is test Title1 toys3      
20580988 This is test Title2 toys3      
20582176 This is test Title1 toys4 

如何通過Product Type組項目,並找到所有獨特單詞title

輸出格式:

Product Type Unique_words 
------------ ------------ 
toys   This is test Title1 Title2 Title3 
toys2   This is test Title1 Title2 
toys3   This is test Title1 Title2 
toys4   This is test Title1 

更新
到現在我已經做了代碼直到讀取該文件,並存儲到一個數組:

class Product 
    attr_reader :sku, :title, :productType 
    def initialize(sku,title,productType) 
     @sku = sku 
     @title = title 
     @productType = productType 
    end 

    def sku 
     @sku 
    end 

    def title 
     @title 
    end 

    def productType 
     @productType 
    end 
end 

class FileReader 
    def ReadFile(m_FilePath) 
    array = Array.new 
    lines = IO.readlines(m_FilePath) 

    lines.each_with_index do |line, i| 
     current_row = line.split("\t") 
     product = Product.new(current_row[0],current_row[1],current_row[2]) 

     array.push product 
    end 
    end 
end 

filereader_method = FileReader.new.method("ReadFile") 
Reading = filereader_method.to_proc 

puts Reading.call("Input.txt") 
+0

你可以提供一些樣品產出嗎? –

+0

向我們展示您到目前爲止所嘗試的內容,以及哪些內容無法正常工作。 –

+0

請查看更新後的問題 –

回答

0

要獲得分組,您可以使用Enumerable#group_by

Product = Struct.new(:sku, :title, :product_type) 

def products_by_type(file_path) 
    File.open(file_path) 
     .map{ |line| Product.new(*line.chomp.split("\t")) } 
     .group_by{ |product| product.product_type } 
end 

Ruby的美是你有這麼多的選擇。您還可以檢查出CSV lib和OpenStruct,因爲這僅僅是一個數據對象:

require 'csv' 
require 'ostruct' 

def products_by_type(file_path) 
    csv_opts = { col_sep: "\t", 
       headers: true, 
       header_converters: [:downcase, :symbol] } 

    CSV.open(file_path, csv_opts) 
    .map{ |row| OpenStruct.new row.to_hash } 
    .group_by{ |product| product.product_type } 
end 

或者使用創建的基於哈希鍵的成語刪除調用#to_hashrow以上:

class Product 
    attr_accessor :sku, :title, :product_type 

    def initialize(data) 
    data.each{ |key, value| self.key = value } 
    end 
end 

def products_by_type(file_path) 
    csv_opts = { #... } 

    CSV.open(file_path, csv_opts) 
    .map{ |row| Product.new row } 
    .group_by{ |product| product.product_type } 
end 

然後基於散列,根據需要格式化輸出:

def unique_title_words(*products) 
    products.flat_map{ |product| product.title.scan(/\w+/) } 
      .unique 
end 

puts "Product Type\tUnique Words" 
products_by_type("./file.txt").each do |type, products| 
    puts "#{type}\t#{unique_title_words products}" 
end 
相關問題