StatisticsDB在Crawler4j中做些什麼？

我想了解Crawler4j開源的網絡爬蟲。這其間，我有些懷疑，按照其次，StatisticsDB在Crawler4j中做些什麼？

問題： -

什麼是StatisticsDB做櫃檯類，並請解釋下面的代碼部分，

。

public Counters(Environment env, CrawlConfig config) throws DatabaseException { 
    super(config); 

    this.env = env; 
    this.counterValues = new HashMap<String, Long>(); 

    /* 
    * When crawling is set to be resumable, we have to keep the statistics 
    * in a transactional database to make sure they are not lost if crawler 
    * is crashed or terminated unexpectedly. 
    */ 
    if (config.isResumableCrawling()) { 
     DatabaseConfig dbConfig = new DatabaseConfig(); 
     dbConfig.setAllowCreate(true); 
     dbConfig.setTransactional(true); 
     dbConfig.setDeferredWrite(false); 
     statisticsDB = env.openDatabase(null, "Statistics", dbConfig); 

     OperationStatus result; 
     DatabaseEntry key = new DatabaseEntry(); 
     DatabaseEntry value = new DatabaseEntry(); 
     Transaction tnx = env.beginTransaction(null, null); 
     Cursor cursor = statisticsDB.openCursor(tnx, null); 
     result = cursor.getFirst(key, value, null); 

     while (result == OperationStatus.SUCCESS) { 
      if (value.getData().length > 0) { 
       String name = new String(key.getData()); 
       long counterValue = Util.byteArray2Long(value.getData()); 
       counterValues.put(name, counterValue); 
      } 
      result = cursor.getNext(key, value, null); 
     } 
     cursor.close(); 
     tnx.commit(); 
    } 
}

據我瞭解，這樣可以節省抓取的網址，可以幫助在爬行時墜毀的話，那麼網絡爬蟲並不需要從開始被軋花。 請你能請一行一行解釋上面的代碼。

2。因爲Crawlers4j使用SleepyCat來存儲中間信息，所以我沒有找到解釋SleepyCat的好鏈接。所以請告訴我一些很好的資源，從那裏我可以學習SleepyCat的基本知識。（我不知道在上面的代碼中使用的Cursor是什麼意思）。

請幫助我。尋找你的迴應。

來源

2013-05-17 devsda

如果它回答了您的問題，請立即/接受 – Julien

@JulienS。它回答了我的問題。 – devsda

基本上，Crawler4j通過加載數據庫中的所有值來加載數據庫中的現有統計信息。實際上，代碼幾乎不正確，因爲事務處於打開狀態，並且沒有對數據庫進行任何修改。因此，處理tnx的行可以被刪除。

評論一行一行：

//Create a database configuration object 
DatabaseConfig dbConfig = new DatabaseConfig(); 
//Set some parameters : allow creation, set to transactional db and don't use deferred write 
dbConfig.setAllowCreate(true); 
dbConfig.setTransactional(true); 
dbConfig.setDeferredWrite(false); 
//Open the database called "Statistics" with the upon created configuration 
statisticsDB = env.openDatabase(null, "Statistics", dbConfig); 

OperationStatus result; 
//Create new database entries key and values 
    DatabaseEntry key = new DatabaseEntry(); 
    DatabaseEntry value = new DatabaseEntry(); 
//Start a transaction 
    Transaction tnx = env.beginTransaction(null, null); 
//Get the cursor on the DB 
    Cursor cursor = statisticsDB.openCursor(tnx, null); 
//Position the cursor to the first occurrence of key/value 
    result = cursor.getFirst(key, value, null); 
//While result is success 
    while (result == OperationStatus.SUCCESS) { 
//If the value at the current cursor position is not null, get the name and the value of  the counter and add it to the Hashmpa countervalues 
     if (value.getData().length > 0) { 
      String name = new String(key.getData()); 
      long counterValue = Util.byteArray2Long(value.getData()); 
      counterValues.put(name, counterValue); 
     } 
     result = cursor.getNext(key, value, null); 
    } 
    cursor.close(); 
//Commit the transaction, changes will be operated on th DB 
    tnx.commit();

我也回答了類似的問題here。關於SleepyCat，你在說什麼this？

來源

2013-06-07 08:54:07 Julien

StatisticsDB在Crawler4j中做些什麼？

回答

相關問題