2017-08-02 66 views
2

我是Haskell的新手,想要解決我的問題的一些方向。我想有一個文本編碼功能,列出文本中的每個單詞由其索引表示。對於例如:Haskell文本編碼器

["The more I like, the more I love.","The more I love, the more I hate."] 

輸出可能是

(["The", "more", "I", "like", "the", "love.", "love,", "hate."], 
    [1, 2, 3, 4, 5, 2, 3, 6, 1, 2, 3, 7, 1, 2, 3, 8]) 

我也做了消除重複部分

removeDuplicates :: Eq a => [a] -> [a] 
removeDuplicates = rdHelper [] 
    where rdHelper seen [] = seen 
      rdHelper seen (x:xs) 
      | x `elem` seen = rdHelper seen xs 
      | otherwise = rdHelper (seen ++ [x]) xs 
+0

爲什麼你用'逗號',用逗號,但是''''沒有逗號,儘管'like'在句子中逗號? –

回答

1

你可以只遍歷單詞列表和積累的獨特的文字和其索引。如果元素位於累積列表中,請將索引附加到累積的索引列表中。如果元素不在列表中,請附加新索引(單詞列表的長度+ 1)。

說實話,Haskell代碼更容易理解,比我的描述:

import Data.List (findIndex) 

build :: ([String], [Int]) -> String -> ([String], [Int]) 
build (words, indexes) word = 
    let 
    maybeIndex = findIndex (== word) words 
    in 
    case maybeIndex of 
     Just index -> 
     (words, indexes ++ [index + 1]) 
     Nothing -> 
     (words ++ [word], indexes ++ [(+1) . length $ words]) 

buildIndexes = 
    let 
    listOfWords = words "The more I like, the more I love. The more I love, the more I hate." 
    in 
    foldl build ([], []) listOfWords 

這裏我有一個連接字符串作爲輸入

"The more I like, the more I love. The more I love, the more I hate."

隨意定製代碼您的需求。

順便說一下,將元素插入列表開頭然後顛倒結果列表可能會更有效。

import Data.List (findIndex) 

build :: ([String], [Int]) -> String -> ([String], [Int]) 
build (words, indexes) word = 
    let 
    maybeIndex = findIndex (== word) words 
    in 
    case maybeIndex of 
     Just index -> 
     (words, (index + 1) : indexes) 
     Nothing -> 
     (word : words, ((+1) . length $ words) : indexes) 

buildIndexes = 
    let 
    listOfWords = words "The more I like, the more I love. The more I love, the more I hate." 
    (listOfUniqueWords, listOfIndexes) = foldl build ([], []) listOfWords 
    in 
    (reverse listOfUniqueWords, reverse listOfIndexes) 
+0

我希望函數能夠從用戶的輸入列表中獲取,而不是指定的單詞 – dsvjksv

1

我猜Data.MapData.Set包是有效地解決這項工作的理想工具。我的執行情況如下:

import qualified Data.Map.Lazy as Map 
import qualified Data.Set as Set 

encode :: [String] -> ([String],[[Int]]) 
encode wss = let dict = Map.fromList . zip (Set.toList . Set.unions . map (Set.fromList . words) $ wss) $ [1..] 
      in (map fst $ Map.toList dict, map (map (flip (Map.findWithDefault 0) dict) . words) wss) 

*Main> encode ["Are you allright", "Hey there how are you", "Hello there", "Do you like coffee"] 
(["Are","Do","Hello","Hey","allright","are","coffee","how","like","there","you"],[[1,11,5],[4,10,8,6,11],[3,10],[2,11,9,7]])