0
Nutch的大師,什麼Nutch的命令,我需要通過命令行調用,如果我更新URL過濾文本
如果我改變的文件,如的robots.txt,或正則表達式,urlfilter.txt和任何這樣的資源,我需要調用哪個命令?
我不確定從nutch的說明。我猜這是解析器工作,但我不確定。
卡爾蒂克
從指令
# echo " crawl one-step crawler for intranets"
echo " inject inject new urls into the database"
echo " hostinject creates or updates an existing host table from a text file"
echo " generate generate new batches to fetch from crawl db"
echo " fetch fetch URLs marked during generate"
echo " parse parse URLs marked during fetch"
echo " updatedb update web table after parsing"
echo " updatehostdb update host table after parsing"
echo " readdb read/dump records from page database"
echo " readhostdb display entries from the hostDB"
echo " elasticindex run the elasticsearch indexer"
echo " solrindex run the solr indexer on parsed batches"
echo " solrdedup remove duplicates from solr"
echo " parsechecker check the parser for a given url"
echo " indexchecker check the indexing filters for a given url"
echo " plugin load a plugin and run one of its classes main()"
echo " nutchserver run a (local) Nutch server on a user defined port"
echo " junit runs the given JUnit test"
echo " or"
echo " CLASSNAME run the class named CLASSNAME"
echo "Most commands print help when invoked w/o parameters."