站點地圖生成器上的Ruby

我有一些網站，例如http://example.com
我要生成一個網站地圖爲URI的列表，如：站點地圖生成器上的Ruby

http://example.com/main
http://example.com/tags
http://example.com/tags/foo
http://example.com/tags/bar

我發現它的一個很好的應用：iGooMap
iGooMap可以生成所需的URI列表作爲文本文件（而不是XML文件）。
這裏是什麼，我想實現的可視化表示：

Here is what I would like to have

我想有這種類型的紅寶石（不 Rails）的生成網站地圖的。
我找到了SiteMapGenerator，但它只生成一個.XML文件，但是據說我需要一個文本文件。

是否有解決方案的Ruby創建一個給定的網站的鏈接列表？

來源

2012-11-07 Sergey Blohin

你想要的是不是在Ruby 站點地圖生成器，但一個web蜘蛛在Ruby中。我建議Anemone

require 'anemone' 

links = [] 

Anemone.crawl("http://www.foo.com/") do |anemone| 
    anemone.on_every_page do |page| 
     links << page.url 
    end 
end 

File.open('./link_list.txt', 'wb'){|f| f.write links.join("\n") }

這會產生一個名爲link_list.txt與文件，內容如下：

http://www.foo.com/ 
http://www.foo.com/digimedia_privacy_policy.html

還有Wombat，Spidr，Pioneer等等。

編輯：正如@ChrisCummings暗示，它可能是一個更好的主意，以防止重複使用Set而不是Array。我還建議按字母順序排序的鏈接，這將使輸出文件更易於閱讀的人：

require 'anemone' 
require 'set' 

links = Set.new         # Set will prevent duplicates 

Anemone.crawl("http://www.foo.com/") do |anemone| 
    anemone.on_every_page do |page| 
    links << page.url.to_s       # to_s needed in order to sort 
    end 
end 

File.open('./link_list.txt', 'wb') do |f| 
    f.write links.sort.join("\n")     # call to sort added 
end

來源

2012-11-07 12:11:19

這杯紳士兩杯茶！這正是我所期待的。非常感謝！ –

不客氣;-) –

我會使用Set而不是Array作爲我的集合類，以避免重複的URL。 http://www.ruby-doc.org/stdlib-2.0.0/libdoc/set/rdoc/Set.html –

您可以用自定義適配器擴展sitemap_generator，例如：

require 'sitemap_generator' 
require 'nokogiri' 

module SitemapGenerator 
    class TextFileAdapter 
    def write(location, raw_data) 
     # Ensure that the directory exists 
     dir = location.directory 
     if !File.exists?(dir) 
     FileUtils.mkdir_p(dir) 
     elsif !File.directory?(dir) 
     raise SitemapError.new("#{dir} should be a directory!") 
     end 

     doc = Nokogiri::XML(raw_data) 
     txt = doc.css('url loc').map(&:text).join("\n") 

     open(location.path, 'wb') do |f| 
     f.write(txt) 
     end 
    end 
    end 
end 

SitemapGenerator::Sitemap.default_host = 'http://example.com' 
SitemapGenerator::Sitemap.create(
    :adapter => SitemapGenerator::TextFileAdapter.new, 
    :sitemaps_namer => SitemapGenerator::SitemapNamer.new(:sitemap, :extension => '.txt') 
) do 
    add '/home', :changefreq => 'daily', :priority => 0.9 
    add '/contact_us', :changefreq => 'weekly' 
end 
SitemapGenerator::Sitemap.ping_search_engines

這導致文件public/sitemap1.txt：

http://example.com 
http://example.com/home 
http://example.com/contact_us

來源

2012-11-07 11:12:25

日Thnx，但我需要添加所有的URI中的「添加‘/％甕％’，：的changefreq =>‘周’ 「模板？我需要從文本文件中自動生成所有uri文件。 –

示例：網站有兩個頁面。/foo，/ bar。 /foo頁面有連接到/ bar頁面。 <！ - file /foo.html - > Bar 我需要站點地圖。txt與下一個來源： http://example.com/foo/ http://example.com/bar/ 這種情況下也稱爲「蜘蛛」。 –

請看我的第二個答案。 –

站點地圖生成器上的Ruby

回答

相關問題