Heroku上登臺服務器的不同robots.txt

我在Heroku上登臺並製作應用程序。Heroku上登臺服務器的不同robots.txt

對於爬蟲，我設置了robots.txt文件。

之後，我收到了谷歌的消息。
* .herokuapp.com
herokuapp.com

：

親愛的站長，您的網站，https://www.myapp.com/的主機名，不會在你的SSL證書，這是匹配任何「主題名稱」

Google bot在我的分段應用程序中讀取robots.txt併發送此消息。因爲我沒有設置任何防止抓取工具讀取文件的內容。

所以，我在想的是在分段和生產之間更改.gitignore文件，但我無法弄清楚如何做到這一點。

實現此目的的最佳實踐是什麼？

編輯

我GOOGLE了這件事，發現這篇文章http://goo.gl/2ZHal

這篇文章說設置基本機架認證，你會不會需要關心的robots.txt。

我不知道基本身份驗證可以防止谷歌機器人。看來這個解決方案更好地處理.gitignore文件。

來源

2012-08-05 Atsuhiro Teshima

如果git是你想要去的方式，你可以使用修改過的'.gitignore'來維護一個'staging' **分支**，並將該分支推送到heroku上的分段站點。 – deefour 2012-08-05 02:59:28

嗯，我還是新來的混帳，需要了解更多關於分支，檢查有關該方法。謝謝。 – 2012-08-05 08:21:15

怎麼樣服務/robots.txt動態使用控制器動作而不是有一個靜態文件？取決於您允許或不允許搜索引擎索引您的應用程序的環境。

來源

2012-08-05 07:27:36 iltempo

我搜索了一下，發現了幾個資源，我會嘗試。感謝您的建議。 – 2012-08-05 08:42:53

Rails 3的一個很好的解決方案是使用Rack。這是一個很好的文章，概述了這個過程：Serving Different Robots.txt Using Rack。總之，您添加到您的routes.rb：

# config/routes.rb 
require 'robots_generator' # Rails 3 does not autoload files in lib 
match "/robots.txt" => RobotsGenerator

，然後創建的lib/robots_generator.rb內的新文件

# lib/robots_generator.rb 
class RobotsGenerator 
    # Use the config/robots.txt in production. 
    # Disallow everything for all other environments. 
    # http://avandamiri.com/2011/10/11/serving-different-robots-using-rack.html 
    def self.call(env) 
    body = if Rails.env.production? 
     File.read Rails.root.join('config', 'robots.txt') 
    else 
     "User-agent: *\nDisallow: /" 
    end 

    # Heroku can cache content for free using Varnish. 
    headers = { 'Cache-Control' => "public, max-age=#{1.month.seconds.to_i}" } 

    [200, headers, [body]] 
    rescue Errno::ENOENT 
    [404, {}, ['# A robots.txt is not configured']] 
    end 
end

最後確保包括移動的robots.txt到你的配置文件夾（或您在RobotsGenerator課程中指定的任何地方）。

來源

2012-11-27 05:22:32 stereoscott

實際上這篇文章說把這個文件放到lib/robots_generator.rb中，同時也將robots.txt從public /移動到config/ – linojon 2013-09-24 18:16:02

@linojon感謝剛剛更新了答案 – stereoscott 2013-09-24 22:28:29

最好的解決方案！ – Attenzione 2014-07-04 11:05:36

Heroku上登臺服務器的不同robots.txt

回答

相關問題