2010-08-27 69 views
0

我正在用Ruby編寫一個程序,將RSS源中的文件下載到我的本地硬盤。以前,我用Perl編寫了這個應用程序,並認爲一種學習Ruby的好方法就是使用Ruby代碼重新創建該程序。直接從RSS源下載文件使用Ruby - 處理重定向

在Perl程序(它的工作原理)中,我能夠直接從它所在的服務器上下載原始文件(保留原始文件名)並且工作得很好。在Ruby程序中(這不起作用),我必須將我想要的文件中的數據「流」到我在硬盤上創建的新文件中。不幸的是,這不起作用,「流」數據總是回到空白。我的假設是,Perl可以處理某種類型的重定向來直接檢索文件,而Ruby不能。

我打算髮布這兩個程序(它們相對較小),並希望這有助於解決我的問題。如果您有問題,請告訴我。作爲一個便箋,我指出這個程序在一個更加靜態的URL(一個jpeg)上,並且它下載了這個文件就好了。這就是爲什麼我推理某種重定向導致了問題。

Ruby代碼(不工作)

 

require 'net/http'; 
require 'open-uri'; 
require 'rexml/document'; 
require 'sqlite3'; 
# Create new SQLite3 database connection 
db_connection = SQLite3::Database.new('fiend.db'); 
# Make sure I can reference records in the query result by column name instead of index number 
db_connection.results_as_hash = true; 
# Grab all TV shows from the shows table 
query = ' 
    SELECT 
     id, 
     name, 
     current_season, 
     last_episode 
    FROM 
     shows 
    ORDER BY 
     name 
'; 
# Run through each record in the result set 
db_connection.execute(query) { |show| 
    # Pad the current season number with a zero for later user in a search query 
    season = '%02d' % show['current_season'].to_s; 
    # Calculate the next episode number and pad with a zero 
    next_episode = '%02d' % (Integer(show['last_episode']) + 1).to_s; 
    # Store the name of the show 
    name = show['name']; 
    # Generate the URL of the RSS feed that will hold the list of torrents 
    feed_url = URI.encode("http://btjunkie.org/rss.xml?query=#{name} S#{season}E#{next_episode}&o=52"); 
    # Generate a simple string the denotes the show, season and episode number being retrieved 
    episode_id = "#{name} S#{season}E#{next_episode}"; 
    puts "Loading feed for #{name}.."; 
    # Store the response from the download of the feed 
    feed_download_response = Net::HTTP.get_response(URI.parse(feed_url)); 
    # Store the contents of the response (in this case, XML data) 
    xml_data = feed_download_response.body; 
    puts "Feed Loaded. Parsing items.." 
    # Create a new REXML Document and pass in the XML from the Net::HTTP response 
    doc = REXML::Document.new(xml_data); 
    # Loop through each in the feed 
    doc.root.each_element('//item') { |item| 
     # Find and store the URL of the torrent we wish to download 
     torrent_url = item.elements['link'].text + '/download.torrent'; 
     puts "Downloading #{episode_id} from #{torrent_url}"; 
     ## This is where crap stops working 
     # Open Connection to the host 
     Net::HTTP.start(URI.parse(torrent_url).host, 80) { |http| 
      # Create a torrent file to dump the data into 
      File.open("#{episode_id}.torrent", 'wb') { |torrent_file| 
       # Try to grab the torrent data 
       data = http.get(torrent_url[19..torrent_url.size], "User-Agent" => "Mozilla/4.0").body; 
       # Write the data to the torrent file (the data is always coming back blank) 
       torrent_file.write(data); 
       # Close the torrent file 
       torrent_file.close(); 
      } 

     } 
     break; 
    } 
} 
 

Perl代碼(即不工作)

 

use strict; 
use XML::Parser; 
use LWP::UserAgent; 
use HTTP::Status; 
use DBI; 
my $dbh = DBI->connect("dbi:SQLite:dbname=fiend.db", "", "", { RaiseError => 1, AutoCommit => 1 }); 
my $userAgent = new LWP::UserAgent; # Create new user agent 
$userAgent->agent("Mozilla/4.0"); # Spoof our user agent as Mozilla 
$userAgent->timeout(20); # Set timeout limit for request 
my $currentTag = ""; # Stores what tag is currently being parsed 
my $torrentUrl = ""; # Stores the data found in any node 
my $isDownloaded = 0; # 1 or zero that states whether or not we've downloaded a particular episode 
my $shows = $dbh->selectall_arrayref("SELECT id, name, current_season, last_episode FROM shows ORDER BY name"); 
my $id = 0; 
my $name = ""; 
my $season = 0; 
my $last_episode = 0; 
foreach my $show (@$shows) { 
    $isDownloaded = 0; 
    ($id, $name, $season, $last_episode) = (@$show); 
    $season = sprintf("%02d", $season); # Append a zero to the season (e.g. 6 becomes 06) 
    $last_episode = sprintf("%02d", ($last_episode + 1)); # Append a zero to the last episode (e.g. 6 becomes 06) and increment it by one 
    print("Checking $name S" . $season . "E" . "$last_episode \n"); 
    my $request = new HTTP::Request(GET => "http://btjunkie.org/rss.xml?query=$name S" . $season . "E" . $last_episode . "&o=52"); # Retrieve the torrent feed 
    my $rssFeed = $userAgent->request($request); # Store the feed in a variable for later access 
    if($rssFeed->is_success) { # We retrieved the feed 
     my $parser = new XML::Parser(); # Make a new instance of XML::Parser 
     $parser->setHandlers # Set the functions that will be called when the parser encounters different kinds of data within the XML file. 
     (
      Start => \&startHandler, # Handles start tags (e.g.) 
      End => \&endHandler, # Handles end tags (e.g. 
      Char => \&DataHandler # Handles data inside of start and end tags 
     ); 
     $parser->parsestring($rssFeed->content); # Parse the feed 
    } 
} 

# 
# Called every time XML::Parser encounters a start tag 
# @param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed. 
# @param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed. 
# @attributes {array} | An array of all of the attributes of $element 
# @returns: void 
# 
sub startHandler { 
    my($parseInstance, $element, %attributes) = @_; 
    $currentTag = $element; 
} 
# 
# Called every time XML::Parser encounters anything that is not a start or end tag (i.e, all the data in between tags) 
# @param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed. 
# @param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed. 
# @attributes {array} | An array of all of the attributes of $element 
# @returns: void 
# 
sub DataHandler { 
    my($parseInstance, $element, %attributes) = @_; 
    if($currentTag eq "link" && $element ne "\n") { 
     $torrentUrl = $element; 
    } 
} 
# 
# Called every time XML::Parser encounters an end tag 
# @param: $parseInstance {object} | Instance of the XML::Parser. Passed automatically when feed is parsed. 
# @param: $element {string} | The name of the XML element being parsed (e.g. "title"). Passed automatically when feed is parsed. 
# @attributes {array} | An array of all of the attributes of $element 
# @returns: void 
# 
sub endHandler { 
    my($parseInstance, $element, %attributes) = @_; 
    if($element eq "item" && $isDownloaded == 0) { # We just finished parsing an element so let's attempt to download a torrent 
     print("DOWNLOADING: $torrentUrl" . "/download.torrent \n"); 
     system("echo.|lwp-download " . $torrentUrl . "/download.torrent"); # We echo the "return " key into the command to force it to skip any file-overwite prompts 
     if(unlink("download.torrent.html")) { # We tried to download a 'locked' torrent 
      $isDownloaded = 0; # Forces program to download next torrent on list from current show 
     } 
     else { 
      $isDownloaded = 1; 
      $dbh->do("UPDATE shows SET last_episode = '$last_episode' WHERE id = '$id'"); # Update DB with new show information 
     } 
    } 
} 
 

回答

1

是的,您正在檢索的URL似乎返回302(重定向)。 Net :: HTTP要求/允許你自己處理重定向。您通常使用像AboutRuby這樣的遞歸技術(儘管這個http://www.ruby-forum.com/topic/142745建議您不僅應該查看「位置」字段,還應該查看響應中的META REFRESH)。

開放-URI將處理重定向你,如果你不感興趣的低級別交互:

require 'open-uri' 

File.open("#{episode_id}.torrent", 'wb') {|torrent_file| torrent_file.write open(torrent_url).read} 
0

get_response將從類HTTPResponse層次返回一個類。它通常是HTTPSuccess,但如果有重定向,它將是HTTPRedirection。一個簡單的遞歸方法可以解決這個問題,它遵循重定向。如何正確處理這個問題在docs標題下的「重定向之後」。