Reppy的行爲與某些條目奇怪

我測試Reppy與google.com.robots.txt，但與某些條目它有點任性。Reppy的行爲與某些條目奇怪

下面是那些：
Disallow: /alerts/，這裏必須得到一個假。
Allow: /alerts/$，這裏必須得到一個True。

我得到True的第一個和False第二項。雖然我在其他條目中得到了正確的結果。

>>> import reppy 
>>> from reppy.cache import RobotsCache 
>>> robots = RobotsCache() 
>>> rules = robots.fetch("http://google.com") 
>>> rules.allowed('/search', 't') 
False  
>>> rules.allowed('/search/about', 't') 
True 

#While things are right above, they are unexpected below 

>>> rules.allowed('/alerts/', 't') 
True # FALSE is expected here 
>>> rules.allowed('/alerts/$', 't') 
False # TRUE is expected here 
>>>

我會很感激，如果有人給我一個提示，看看糾正這一點。

來源

2016-04-19 abT

$不是字面意思$，它表示「URL的結尾」。

看看this documentation：

谷歌，必應，雅虎和Ask支持「通配符」的有限的形式路徑值。它們是：

*指定任何有效的字符

$的0或多個實例指定URL的末尾

，所以這些規則：

Allow: /alerts/manage 
Allow: /alerts/remove 
Disallow: /alerts/ 
Allow: /alerts/$

等於說您可以訪問/alerts/manage,/alerts/remove和/alerts/，但不是/alerts/的任何其他子女（例如，/alerts/foo）。所以，你所看到的結果是準確的：

這是因爲它符合Allow: /alerts/$返回true：

>>> rules.allowed('/alerts/', 't') 
True

這將返回false，因爲它符合Disallow: /alerts/：

>>> rules.allowed('/alerts/$', 't') 
False

來源

2016-04-19 12:52:56 larsks

謝謝！完全明白這一點。 – abT

Reppy的行爲與某些條目奇怪

回答

相關問題