我寫了一個程序來計算語料庫中NGrams的頻率。我已經有消耗記號流,併產生一個單一訂單的n元語法功能:Conduit:Multiple Stream Consumers
ngram :: Monad m => Int -> Conduit t m [t]
trigrams = ngram 3
countFreq :: (Ord t, Monad m) => Consumer [t] m (Map [t] Int)
目前我只能一個流的消費者連接到流源:
tokens --- trigrams --- countFreq
怎麼辦我將多個流消費者連接到相同的流源? 我想有這樣的事情:
.--- unigrams --- countFreq
|--- bigrams --- countFreq
tokens ----|--- trigrams --- countFreq
'--- ... --- countFreq
的加將並行運行
編輯每個消費者: 多虧了切赫,我想出了這個解決方案
spawnMultiple orders = do
chan <- atomically newBroadcastTMChan
results <- forM orders $ \_ -> newEmptyMVar
threads <- forM (zip results orders) $
forkIO . uncurry (sink chan)
forkIO . runResourceT $ sourceFile "test.txt"
$$ javascriptTokenizer
=$ sinkTMChan chan
forM results readMVar
where
sink chan result n = do
chan' <- atomically $ dupTMChan chan
freqs <- runResourceT $ sourceTMChan chan'
$$ ngram n
=$ frequencies
putMVar result freqs
你希望當'tokens'產生一個值時,你所有的'grams'都會收到它? –