如何讓apache nutch永久爬取

我正在使用apache nutch（2.2.1）進行爬取。如果我想永遠爬行，需要做什麼更改。我完全引導我，因爲我對nutch不太熟悉。如何讓apache nutch永久爬取

2014-10-20 Shafiq

如果你想永遠爬，下面是腳本，您需要：

#!/bin/bash 

./bin/nutch inject urls #urls is the seed data 
while [ 1 == 1 ] 
do 
    ./bin/nutch generate -topN 10000 # 10000 is the number of URLs will be fetch in each crawling round, you can modify it 
    ./bin/nutch fetch -all 
    ./bin/nutch parse -all 
    ./bin/nutch updatedb 

done

希望這有助於

李全安待辦事項

來源

2014-12-07 17:37:03

如何讓apache nutch永久爬取

回答

相關問題