我目前使用crawler4j作爲我選擇的網絡爬蟲,我試圖自學網絡爬蟲的工作方式。我已經開始爬行,我估計它可以快速返回在crawlStorageFolder看到下面訪問通過網絡爬蟲存儲的.lck和jdb文件
public class Controller {
public static void main(String[] args) throws Exception {
/*
* crawlStorageFolder is a folder where intermediate crawl data is
* stored.
*/
String crawlStorageFolder = "/data/crawl/root";
/*
* numberOfCrawlers shows the number of concurrent threads that should
* be initiated for crawling.
*/
int numberOfCrawlers = 7;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(crawlStorageFolder);
問題已爬網數據(/數據/爬行/根)是我能找到的唯一信息是兩個.LCK文件和一個.jdb文件位於我假設的crawlStorageFolder位置,是存儲數據的位置,但我無法打開它們。是否有人願意幫助我理解我如何訪問數據,以便我能夠成功地將其存入數據庫並最終顯示在我的網站上。這將不勝感激。