Java多線程解析器

我在寫一個多線程解析器。解析器類如下。Java多線程解析器

public class Parser extends HTMLEditorKit.ParserCallback implements Runnable { 

    private static List<Station> itemList = Collections.synchronizedList(new ArrayList<Item>()); 
    private boolean h2Tag = false; 
    private int count; 
    private static int threadCount = 0; 

    public static List<Item> parse() { 
     for (int i = 1; i <= 1000; i++) { //1000 of the same type of pages that need to parse 

      while (threadCount == 20) { //limit the number of simultaneous threads 
       try { 
        Thread.sleep(50); 
       } catch (InterruptedException ex) { 
        ex.printStackTrace(); 
       } 
      } 

      Thread thread = new Thread(new Parser()); 
      thread.setName(Integer.toString(i)); 
      threadCount++; //increase the number of working threads 
      thread.start();    
     } 

     return itemList; 
    } 

    public void run() { 
     //Here is a piece of code responsible for creating links based on 
     //the thread name and passed as a parameter remained i, 
     //connection, start parsing, etc.   
     //In general, nothing special. Therefore, I won't paste it here. 

     threadCount--; //reduce the number of running threads when current stops 
    } 

    private static void addItem(Item item) { 
     itenList.add(item); 
    } 

    //This method retrieves the necessary information after the H2 tag is detected 
    @Override 
    public void handleText(char[] data, int pos) { 
     if (h2Tag) { 
      String itemName = new String(data).trim(); 

     //Item - the item on which we receive information from a Web page 
     Item item = new Item(); 
     item.setName(itemName); 
     item.setId(count); 
     addItem(item); 

     //Display information about an item in the console 
     System.out.println(count + " = " + itemName); 
     } 
    } 

    @Override 
    public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) { 
     if (HTML.Tag.H2 == t) { 
      h2Tag = true; 
     } 
    } 

    @Override 
    public void handleEndTag(HTML.Tag t, int pos) { 
     if (HTML.Tag.H2 == t) { 
      h2Tag = false; 
     } 
    } 
}

從另一個類解析器運行過程如下：

List<Item> list = Parser.parse();

所有這些都是好的，但有一個問題。在最終列表中的解析結束時，「List itemList」包含980個元素，而不是1000個。但是在控制檯中有1000個元素（項目）。也就是說，某些線程出於某種原因沒有在handleText方法中調用addItem方法。

我已經嘗試將itemList的類型更改爲ArrayList，CopyOnWriteArrayList，Vector。使方法addItem同步，更改對同步塊的調用。所有這些僅僅改變了一些元素，但最終的千元無法獲得。

我也嘗試解析更少的頁面數量（十）。結果列表是空的，但在控制檯中全部爲10.

如果我刪除多線程，那麼一切工作正常，但當然，慢慢。這不好。

如果減少併發線程的數量，列表中的項目數量接近所需的1000，如果增加 - 與1000有點距離。也就是說，我認爲，有記錄能力的鬥爭到名單。但那爲什麼同步不起作用？

有什麼問題？

來源

2011-10-16 Alex

我不知道這是否是問題，但目前'threadCount'沒有以線程安全的方式更新。增量和/或減量可能消失。 – harold

你有多少個核心？如果您關心速度，最快的方法可能是使用與內核相同數量的線程（假定進程受CPU限制）。使用具有固定線程池大小的ExecutorService和設計用於收集任務結果的Callable（s）可能會更好。 –

當您的parse()調用返回後，您的所有1000個線程都已啓動，但不保證它們已完成。事實上，他們不是你看到的問題。我強烈建議你不要自己寫這個，而是使用SDK爲這類工作提供的工具。

文檔Thread Pools和ThreadPoolExecutor是例如一個好的起點。再一次，如果你不確定你是否也有這樣的話，不要自己實現，因爲編寫這樣的多線程代碼是很痛苦的。

您的代碼應該是這個樣子：

ExecutorService executor = Executors.newFixedThreadPool(20); 
List<Future<?>> futures = new ArrayList<Future<?>>(1000); 
for (int i = 0; i < 1000; i++) { 
    futures.add(executor.submit(new Runnable() {...})); 
} 
for (Future<?> f : futures) { 
    f.get(); 
}

來源

2011-10-16 17:11:01 Stephan

謝謝。我知道我的代碼太可怕了。 :(我將閱讀你提供的文檔，我會讓這段代碼更加正確 – Alex

使用'Runtime.getRuntime（）。availableProcessors（）'可能是一個更好的池大小選擇 –

有一個與代碼沒有問題，這是工作，你有編碼。問題在於最後一次迭代。休息所有迭代將正常工作，但在最後一次從980到1000的迭代中，創建線程，但主進程不會等待其他線程完成，然後返回列表。因此，如果您一次處理20個線程，您將獲得980到1000之間的某個奇數。

現在，您可以嘗試在返回列表之前添加Thread.wait(50)，在這種情況下，您的主線程將等待一段時間，並且可能在其他線程完成處理之前。

或者你可以使用一些來自java的syncronization API。而不是Thread.wait（），使用CountDownLatch，這將幫助您等待線程完成處理，然後您可以創建新線程。

來源

2011-10-16 17:19:07

使用任意等待時間是「**可能在**時間之前」其他線程已經完成了，這是不能保證的，如果你需要確保所有的線程都已經完成，你必須確保你等待每一個線程 – Stephan

@Stephan：請檢查我的更新，這也是一種使用'CountDownLatch'的方法，使用該方法，進程等待其他線程完成處理。 –

@MJ非常感謝。我添加了'Thread.wait（1000）'，在返回列表之前，程序工作正常。 – Alex

Java多線程解析器

回答

相關問題