0
我需要找到一些.txt文件一個字不是由空格前的第一次出現precded。下面是可能情況:
-- * should succed
t1 = "hello\t999\nworld\t\900"
t2 = "world\t\900\nhello\t999\n"
t3 = "world world\t\900\nhello\t999\n"
-- * should fail
t4 = "world\t\900\nhello world\t999\n"
t5 = "hello world\t999\nworld\t\900"
t6 = "world hello\t999\nworld\t\900"
眼下T6正在取得成功,即使它應該失敗,因爲直到它到達你好我的解析器將消耗任何字符。這裏是我的解析器:
我的解決方案
import Control.Applicative
import Data.Attoparsec.Text.Lazy
import Data.Attoparsec.Combinator
import Data.Text hiding (foldr)
import qualified Data.Text.Lazy as L (Text, pack)
-- * should succed
t1 = L.pack "hello\t999\nworld\t\900"
t2 = L.pack "world\t\900\nhello\t999\n"
-- * should fail
t3 = L.pack "world\t\900\nhello world\t999\n"
t4 = L.pack "hello world\t999\nworld\t\900"
t5 = L.pack "world hello\t999\nworld\t\900"
p = occur "hello"
---- * discard all text until word `w` occurs, and find its only field `n`
occur :: String -> Parser (String, Int)
occur w = do
pUntil w
string . pack $ w
string "\t"
n <- natural
string "\n"
return (w, read n)
-- * Parse a natural number
natural :: Parser String
natural = many1' digit
-- * skip over all words in Text stream until the word we want
pUntil :: String -> Parser String
pUntil = manyTill anyChar . lookAhead . string . pack
解析器是*不*對「發現的序列x的第一次出現合適的工具Y」。您應該將整個字符串解析爲一個數據結構,該數據結構存儲(鍵,值)對以及它們發生的位置。你目前的問題是't6'包含兩個鍵/值對(一個在整個字符串中,一個在後綴中),所以自然地一個回溯解析器找到兩者。解析每個密鑰無條件地解決這個問題。使用attoparsec,您僅限於獲取位置作爲字節索引,但這應該足以滿足您的需要。 – user2407038