阻斷除與Nginx的

幾所有漫遊我想阻止所有標識爲機器人的http_user_agents但允許Googlebot當我把下面的代碼：阻斷除與Nginx的

map $http_user_agent $bad_bot { 
default 1; 
~*^Lynx 0; # Let Lynx go through 
~*^google); 
libwww-perl      1; 
~(?i)(libwww|Wget|LWP::Simple|BBBike|java|crawl|spider|bot) 1; 
}

然而，這阻止訪問，甚至Googlebot的。

來源

2013-10-12 Troy Lone

只需選中$http_user_agent對你的$ bad_bot列表，並返回HTTP 403如果它在你的黑名單：

location/{ 
    if ($http_user_agent ~ (libwww|Wget|LWP|damnBot|BBBike|java|spider|crawl)) { 
     return 403; 
    } 
}

注：~在if block進行區分大小寫的匹配。如果您想讓黑名單不區分大小寫，請使用~*而不是~。

來源

2013-10-12 22:25:02 edigu

這裏我的nginx的邏輯

map $http_user_agent $limit_bots { 
    default 0; 
    ~*(google|bing|yandex|msnbot) 1; 
    ~*(AltaVista|Googlebot|Slurp|BlackWidow|Bot|ChinaClaw|Custo|DISCo|Download|Demon|eCatch|EirGrabber|EmailSiphon|EmailWolf|SuperHTTP|Surfbot|WebWhacker) 1; 
    ~*(Express|WebPictures|ExtractorPro|EyeNetIE|FlashGet|GetRight|GetWeb!|Go!Zilla|Go-Ahead-Got-It|GrabNet|Grafula|HMView|Go!Zilla|Go-Ahead-Got-It) 1; 
    ~*(rafula|HMView|HTTrack|Stripper|Sucker|Indy|InterGET|Ninja|JetCar|Spider|larbin|LeechFTP|Downloader|tool|Navroad|NearSite|NetAnts|tAkeOut|WWWOFFLE) 1; 
    ~*(GrabNet|NetSpider|Vampire|NetZIP|Octopus|Offline|PageGrabber|Foto|pavuk|pcBrowser|RealDownload|ReGet|SiteSnagger|SmartDownload|SuperBot|WebSpider) 1; 
    ~*(Teleport|VoidEYE|Collector|WebAuto|WebCopier|WebFetch|WebGo|WebLeacher|WebReaper|WebSauger|eXtractor|Quester|WebStripper|WebZIP|Wget|Widow|Zeus) 1; 
    ~*(Twengabot|htmlparser|libwww|Python|perl|urllib|scan|Curl|email|PycURL|Pyth|PyQ|WebCollector|WebCopy|webcraw) 1; 
} 

location/{ 
    if ($limit_bots = 1) { 
    return 403; 
    } 
}

來源

2014-07-18 08:40:10

的人應該注意到，這種方法將通過捕捉「google」一詞，並在「機器人」索引你的網站禁止谷歌（因爲谷歌的HTTP_USER_AGENT有字博特它 – bmiskie

@bmiskie很好的說明，謝謝。就我而言，這正是我想要做的！ – emc

阻斷除與Nginx的

回答

相關問題