在nutch 1.3中重新抓取網址

我設置了re_crawler來每天抓取一個網站。但它取得這個網站3次。我應該在nutch中設置什麼屬性？謝謝。在nutch 1.3中重新抓取網址

2011-10-24 mina

我想你在過去的幾個月裏已經找到了自己的解決方案，但這是社區的答案。所述的nutch-default.xml中定義了3個屬性：

<property> 
<name>db.default.fetch.interval</name> 
<value>30</value> 
<description>(DEPRECATED) The default number of days between re-fetches of a page. 
</description> 
</property> 

<property> 
<name>db.fetch.interval.default</name> 
<value>2592000</value> 
<description>The default number of seconds between re-fetches of a page (30 days). 
</description> 
</property> 

<property> 
<name>db.fetch.interval.max</name> 
<value>7776000</value> 
<description>The maximum number of seconds between re-fetches of a page 
(90 days). After this period every page in the db will be re-tried, no 
matter what is its status. 
</description>

可在的nutch-site.xml中被覆蓋。

來源

2012-01-16 15:57:55 jpee

在nutch 1.3中重新抓取網址

回答

相關問題