2014-01-09 85 views
1

嗨,我是Haskell和函數式編程的新手。顯示詞幹詞和使用haskell的詞幹分析

我想找到字符串中的詞幹單詞,並在刪除詞幹之後顯示單詞和單詞。

eg if the string is : "he is a good fisher man. he is fishing and cached two fish" 
output should be : [(fisher,fish), (fishing, fish), (cached, catch)] 

我嘗試這樣做

hasEnding endings w = any (`isSuffixOf` w) endings 
wordsWithEndings endings ws = filter (hasEnding endings) ws 
wordsEndingEdOrIng ws = wordsWithEndings ["ed","ing","er"] . words $ ws 


stemming :: String -> String 
stemming []  = [] 
stemming (x:"ing") = [x] 
stemming (x:"ed") = [x] 
stemming (x:"er") = [x] 
stemming (x:xs) = x : stemming xs 

removestemmings :: String -> String 
removestemmings = unwords . map stemming . words 


findwords = wordsEndingEdOrIng .removestemmings 

這個人是不是工作..這一個給出的結果[]

任何人都可以幫助我做到這一點。

+3

你的函數'removestemmings'實際上消除結尾「ed」,「ing」和「er」。因此,你沒有發現任何與這些結局有關的詞。 –

回答

1

您的findwords函數完全按照您的要求進行。首先,它從每個單詞中刪除詞幹,然後過濾掉沒有詞幹的每個單詞,然後是所有單詞。

你想要做的,而不是什麼是去除所有的莖,拉鍊文字的原始清單列表中,然後過濾該列表由原話有莖:

-- Operate on a single word only. 
hasStem :: String -> Bool 
hasStem w = or $ zipWith isSuffixOf ["ed", "ing", "er"] $ repeat w 

-- Let this function work on a list of words instead 
removeStemmings :: [String] -> [String] 
removeStemmings = map stemming 

-- findWords now takes a sentence, splits into words, remove the stemmings, 
-- zips with the original word list, and filters that list by which had stems 
findWords :: String -> [(String, String)] 
findWords sentence = filter (hasStem . fst) . zip ws $ removeStemmings ws 
    where ws = words sentence 

> findWords "he is a good fisher man. he is fishing and catched two fish" 
[("fisher","fish"),("fishing","fish"),("catched","catch")]