2011-09-14 84 views
0

我有一個從文件(genbank)中提取信息的ruby腳本,我想將這些數據加載到數據庫中。我創建了模型和架構,並連接腳本:如何使用activerecord將數據加載到數據庫中

require 'active_record' 
def establish_connection(db_location= "protein.db.sqlite3") 
    ActiveRecord::Base.establish_connection(
    :adapter => "sqlite3", 
    :database => db_location, 
    :pool => 5, 
    :timeout => 5000 
) 
end 

這是我的腳本,輸出數據:

require 'rubygems' 
require 'bio' 
require 'snp_db_models' 
establish_connection 

snp_positions_file = File.open("snp_position.txt") 
outfile = File.open("output.txt", "w") 
genome_sequence = Bio::FlatFile.open(Bio::EMBL, "ref.embl").next_entry 

snp_positions = Array.new 
snp_positions_file.gets # header line 
while line = snp_positions_file.gets 
    snp_details = line.chomp.split("\t") 
    snp_seq = snp_details[1] 
    snp_positions << snp_details[1].to_i 
end 


mean_snp_per_base = snp_positions.size/genome_sequence.sequence_length.to_f 
puts "Mean snps per base: #{mean_snp_per_base}" 

#outfile = File.open("/Volumes/DataRAID/Projects/GAS/fastq_files/bowtie_results/snp_annotation/genes_with_higher_snps.tsv", "w") 
outfile.puts("CDS start\tCDS end\tStrand\tGene\tLocus_tag\tnote\tsnp_ID\ttranslation_seq\tProduct\tNo_of_snps_per_gene\tsnp_rate_vs_mean") 

genome_sequence.features do |feature| 
    if feature.feature !~ /gene/i && feature.feature !~ /source/i 
    start_pos = feature.locations.locations.first.from 
    end_pos = feature.locations.locations.first.to 

    number_of_snps_in_gene = (snp_positions & (start_pos..end_pos).to_a).size # intersect finds number of times snp occurs within cds location 
    mean_snp_per_base_in_gene = number_of_snps_in_gene.to_f/(end_pos - start_pos) 

    outfile.print "#{start_pos}\t" 
    outfile.print "#{end_pos}\t" 
    if feature.locations.locations.first.strand == 1 
     outfile.print "forward\t" 
    else 
     outfile.print "reverse\t" 
    end 

    qualifiers = feature.to_hash 

    ["gene", "locus_tag", "note", "snp_id", "translation", "product"].each do |qualifier| 
     if qualifiers.has_key?(qualifier) # if there is gene and product in the file 
     # puts "#{qualifier}: #{qualifiers[qualifier]}" 

     outfile.print "#{qualifiers[qualifier].join(",")}\t" 
     else 
     outfile.print " \t" 
     end 
    end 

    outfile.print "#{number_of_snps_in_gene}\t" 
    outfile.print "%.2f" % (mean_snp_per_base_in_gene/mean_snp_per_base) 
    outfile.puts 
end 
end 
outfile.close 

我怎樣才能在outfile.txt加載數據到數據庫中。我需要做像馬歇爾轉儲一樣的東西嗎?

在此先感謝

馬克

+0

按照您的評論重新標記爲Ruby。 –

回答

0

你可以寫一個耙子任務來完成。將它保存在lib/tasks並給它一個.rake擴展名。

desc "rake task to load data into db" 
task :load_data_db => :environment do 
    ... 
end 

由於rails環境已加載,您可以直接訪問您的模型,就像在任何Rails模型/控制器中一樣。當然,它將連接到數據庫,具體取決於執行您的rake任務時定義的環境變量。

+0

我還沒有創建一個Web應用程序,所以我沒有lib /任務。我剛剛創建了一個使用activerecord的數據庫,並且希望將這些數據轉儲到其中 – Mark

+0

這個問題不應該被標記爲'Ruby on Rails',而是'Ruby'。 @查看apneadiving的方法。 –

0

在一個純粹的腳本中,你的模型是未知的。

你必須定義一個最小值來使用它們,就像在Rails應用程序中一樣。簡單地聲明它們:

class Foo << ActiveRecord:Base 

end 

否則,在Rails上下文中,使用Rake任務,這些任務知道Rails應用程序的詳細信息。

+1

謝謝,但如何將數據轉儲到數據庫? – Mark

+0

你應該爲每種數據創建一個模型+必要的遷移來創建數據庫中的列和表。一旦完成,只需使用ActiveRecord syntaxic sugar'Foo.create(:bar =>「value」,:baz => 123)' – apneadiving

+0

我創建了模型並創建了表格等。上面的腳本的輸出是文件與線和9列的hundereds。我想自動轉儲它。我是否寫了一個腳本來讀取outfile的每一行並轉儲它,或者使用諸如marshal dump之類的東西? – Mark

相關問題