2013-05-17 39 views
1

我想了解Crawler4j開源的網絡爬蟲。這其間,我有些懷疑,按照其次,StatisticsDB在Crawler4j中做些什麼?

問題: -

  1. 什麼是StatisticsDB做櫃檯類,並請解釋下面的代碼部分,

    public Counters(Environment env, CrawlConfig config) throws DatabaseException { 
        super(config); 
    
        this.env = env; 
        this.counterValues = new HashMap<String, Long>(); 
    
        /* 
        * When crawling is set to be resumable, we have to keep the statistics 
        * in a transactional database to make sure they are not lost if crawler 
        * is crashed or terminated unexpectedly. 
        */ 
        if (config.isResumableCrawling()) { 
         DatabaseConfig dbConfig = new DatabaseConfig(); 
         dbConfig.setAllowCreate(true); 
         dbConfig.setTransactional(true); 
         dbConfig.setDeferredWrite(false); 
         statisticsDB = env.openDatabase(null, "Statistics", dbConfig); 
    
         OperationStatus result; 
         DatabaseEntry key = new DatabaseEntry(); 
         DatabaseEntry value = new DatabaseEntry(); 
         Transaction tnx = env.beginTransaction(null, null); 
         Cursor cursor = statisticsDB.openCursor(tnx, null); 
         result = cursor.getFirst(key, value, null); 
    
         while (result == OperationStatus.SUCCESS) { 
          if (value.getData().length > 0) { 
           String name = new String(key.getData()); 
           long counterValue = Util.byteArray2Long(value.getData()); 
           counterValues.put(name, counterValue); 
          } 
          result = cursor.getNext(key, value, null); 
         } 
         cursor.close(); 
         tnx.commit(); 
        } 
    } 
    

據我瞭解,這樣可以節省抓取的網址,可以幫助在爬行時墜毀的話,那麼網絡爬蟲並不需要從開始被軋花。 請你能請一行一行解釋上面的代碼。

2。因爲Crawlers4j使用SleepyCat來存儲中間信息,所以我沒有找到解釋SleepyCat的好鏈接。所以請告訴我一些很好的資源,從那裏我可以學習SleepyCat的基本知識。 (我不知道在上面的代碼中使用的Cursor是什麼意思)。

請幫助我。尋找你的迴應。

+0

如果它回答了您的問題,請立即/接受 – Julien

+0

@JulienS。它回答了我的問題。 – devsda

回答

1

基本上,Crawler4j通過加載數據庫中的所有值來加載數據庫中的現有統計信息。 實際上,代碼幾乎不正確,因爲事務處於打開狀態,並且沒有對數據庫進行任何修改。因此,處理tnx的行可以被刪除。

評論一行一行:

//Create a database configuration object 
DatabaseConfig dbConfig = new DatabaseConfig(); 
//Set some parameters : allow creation, set to transactional db and don't use deferred write 
dbConfig.setAllowCreate(true); 
dbConfig.setTransactional(true); 
dbConfig.setDeferredWrite(false); 
//Open the database called "Statistics" with the upon created configuration 
statisticsDB = env.openDatabase(null, "Statistics", dbConfig); 

OperationStatus result; 
//Create new database entries key and values 
    DatabaseEntry key = new DatabaseEntry(); 
    DatabaseEntry value = new DatabaseEntry(); 
//Start a transaction 
    Transaction tnx = env.beginTransaction(null, null); 
//Get the cursor on the DB 
    Cursor cursor = statisticsDB.openCursor(tnx, null); 
//Position the cursor to the first occurrence of key/value 
    result = cursor.getFirst(key, value, null); 
//While result is success 
    while (result == OperationStatus.SUCCESS) { 
//If the value at the current cursor position is not null, get the name and the value of  the counter and add it to the Hashmpa countervalues 
     if (value.getData().length > 0) { 
      String name = new String(key.getData()); 
      long counterValue = Util.byteArray2Long(value.getData()); 
      counterValues.put(name, counterValue); 
     } 
     result = cursor.getNext(key, value, null); 
    } 
    cursor.close(); 
//Commit the transaction, changes will be operated on th DB 
    tnx.commit(); 

我也回答了類似的問題here。 關於SleepyCat,你在說什麼this