解析詞的首次出現未用空白

我需要找到一些.txt文件一個字不是由空格前的第一次出現precded。下面是可能情況：

-- * should succed 
t1 = "hello\t999\nworld\t\900" 
t2 = "world\t\900\nhello\t999\n" 
t3 = "world world\t\900\nhello\t999\n" 

-- * should fail 
t4 = "world\t\900\nhello world\t999\n" 
t5 = "hello world\t999\nworld\t\900" 
t6 = "world hello\t999\nworld\t\900"

眼下T6正在取得成功，即使它應該失敗，因爲直到它到達你好我的解析器將消耗任何字符。這裏是我的解析器：

我的解決方案

import Control.Applicative 

import Data.Attoparsec.Text.Lazy 
import Data.Attoparsec.Combinator 
import Data.Text hiding (foldr) 
import qualified Data.Text.Lazy as L (Text, pack) 



-- * should succed 
t1 = L.pack "hello\t999\nworld\t\900" 
t2 = L.pack "world\t\900\nhello\t999\n" 

-- * should fail 
t3 = L.pack "world\t\900\nhello world\t999\n" 
t4 = L.pack "hello world\t999\nworld\t\900" 
t5 = L.pack "world hello\t999\nworld\t\900" 

p = occur "hello"  

---- * discard all text until word `w` occurs, and find its only field `n` 
occur :: String -> Parser (String, Int) 
occur w = do 
    pUntil w 
    string . pack $ w 
    string "\t" 
    n <- natural 
    string "\n" 
    return (w, read n) 


-- * Parse a natural number 
natural :: Parser String 
natural = many1' digit 

-- * skip over all words in Text stream until the word we want 
pUntil :: String -> Parser String 
pUntil = manyTill anyChar . lookAhead . string . pack

來源

2016-08-15 chibro2

解析器是*不*對「發現的序列x的第一次出現合適的工具Y」。您應該將整個字符串解析爲一個數據結構，該數據結構存儲（鍵，值）對以及它們發生的位置。你目前的問題是't6'包含兩個鍵/值對（一個在整個字符串中，一個在後綴中），所以自然地一個回溯解析器找到兩者。解析每個密鑰無條件地解決這個問題。使用attoparsec，您僅限於獲取位置作爲字節索引，但這應該足以滿足您的需要。 – user2407038

這裏有一個方法來考慮：

{-# LANGUAGE OverloadedStrings #-} 

import Control.Applicative 

import Data.Attoparsec.Text.Lazy 
import Data.Attoparsec.Combinator 
import Data.Text hiding (foldr) 
import qualified Data.Text.Lazy as L (Text, pack) 
import Data.Monoid 

natural = many1' digit 

-- manyTill anyChar (try $ char c <* eof) 

pair0 w = do 
    string (w <> "\t") 
    n <- natural 
    string "\n" 
    return n 

pair1 w = do 
    manyTill anyChar (try $ string ("\n" <> w <> "\t")) 
    n <- natural 
    string "\n" 
    return n 

pair w = pair0 w <|> pair1 w 

t1 = "hello\t999\nworld\t\900" 
t2 = "world\t\900\nhello\t999\n" 
t3 = "world world\t\900\nhello\t999\n" 

-- * should fail 
t4 = "world\t\900\nhello world\t999\n" 
t5 = "hello world\t999\nworld\t\900" 
t6 = "world hello\t999\nworld\t\900" 

test t = parseTest (pair "hello") (L.pack t) 

main = do 
    test t1; test t2; test t3 
    test t4; test t5; test t6

的想法是，pair0在的開頭與給定值的一對匹配輸入和pair1匹配一對換行符後。

關鍵是使用manyTill anyChar (try p)，它將繼續跳過個字符，直到解析器p成功。

（順便說一句 - 我通過閱讀@Cactus書面答覆瞭解到這款採用manyTill和try。）

來源

2016-08-15 21:56:07 ErikR

解析詞的首次出現未用空白

回答

相關問題