Haskell：遍歷字符串/文本文件

我想讀取一個腳本文件，然後處理並輸出到一個html文件。在我的腳本文件中，每當有一個@title（這是一個標題），我將在我的html輸出中添加標籤[header]這是一個標題。所以我的方法是先讀取腳本文件，將內容寫入字符串，處理字符串，然後將字符串寫入html文件。Haskell：遍歷字符串/文本文件

在其他識別@title時，我需要在字符串中逐字符讀取字符。當我閱讀'@'時，我需要檢測下一個字符，看看它們是否有問題。

問題：如何遍歷Haskell中的字符串（這是一個char列表）？

來源

2013-02-16 Charlie Victor

編寫一個解析器。你可以在短期內做更簡單的其他黑客攻擊，但稍後你會後悔的。 – 2013-02-16 18:20:40

關於解析器的話題，Parsec擁有。 – 2013-02-16 18:34:14

@CatPlusPlus這是值得商榷的。在性能方面，Attoparsec可能經常會將其撤出。 – 2013-02-16 18:54:18

你可以使用一個簡單而又重複的技巧，例如

findTag [] = -- end of list code. 
findTag ('@':xs) 
    | take 5 xs == "title" = -- your code for @title 
    | otherwise   = findTag xs 
findTag (_:xs) = findTag xs

所以基本上你只是模式匹配，如果下一個字符（單頭）是「@」，然後你檢查下一個5個字符組成「標題」。如果是的話，你可以繼續解析代碼。如果下一個字符不是'@'，則只需繼續遞歸即可。一旦列表爲空，即可進行第一次模式匹配。

其他人可能有更好的解決方案。

我希望這能回答你的問題。

編輯：

對於一個有點更大的靈活性，如果你想找到一個特定的標籤，你可以這樣做：

findTag [] _ = -- end of list code. 
findTag ('@':xs) tagName 
    | take (length tagName) xs == tagName = -- your code for @title 
    | otherwise = findTag xs 
findTag (_:xs) _ = findTag xs

這樣，如果你這樣做

findTag text "title"

你」我會專門查找標題，並且始終可以將標記名更改爲任何您想要的內容。

另一個編輯：

findTag [] _ = -- end of list code. 
findTag ('@':xs) tagName 
    | take tLength xs == tagName = getTagContents tLength xs 
    | otherwise = findTag xs 
    where tLength = length tagName 
findTag (_:xs) _ = findTag xs 

getTagContents :: Int -> String -> String 
getTagContents len = takeWhile (/=')') . drop (len + 1)

說實話，它變得有點凌亂，但這裏發生的事情：

您先刪除標記名的長度，然後再來一個開放的支架，以及然後用takeWhile結束字符直到右括號結束。

來源

2013-02-16 18:17:05 Attic

謝謝您的建議！ – 2013-02-16 18:25:40

很明顯，您的問題屬於解析類別。正如Daniel Wagner明智地指出的那樣，出於可維護性原因，通常使用解析器來處理它會更好。

另一件事是如果你想有效地處理文本數據，你最好使用Text而不是String。

這裏是你如何能使用Attoparsec解析庫解決問題：

-- For autocasting of hardcoded strings to `Text` type 
{-# LANGUAGE OverloadedStrings #-} 

-- Import a way more convenient prelude, excluding symbols conflicting 
-- with the parser library. See 
-- http://hackage.haskell.org/package/classy-prelude 
import ClassyPrelude hiding (takeWhile, try) 
-- Exclude the standard Prelude 
import Prelude() 
import Data.Attoparsec.Text 

-- A parser and an inplace converter for title 
title = do 
    string "@title(" 
    r <- takeWhile $ notInClass ")" 
    string ")" 
    return $ "[header]" ++ r ++ "[/header]" 

-- A parser which parses the whole document to parts which are either 
-- single-character `Text`s or modified titles 
parts = 
    (try endOfInput >> return []) ++ 
    ((:) <$> (try title ++ (singleton <$> anyChar)) <*> parts) 

-- The topmost parser which concats all parts into a single text 
top = concat <$> parts 

-- A sample input 
input = "[email protected](this is a title)[email protected](this is a title2)" 

-- Run the parser and output result 
main = print $ parseOnly top input

此輸出

Right "aldsfj[header]this is a title[/header]sdlfkj[header]this is a title2[/header]"

附： ClassyPrelude重新實現++作爲Monoid的mappend的別名，因此如果需要，可以用mappend，<>或Alternative的<|>替換它。

來源

2013-02-16 22:19:31

Haskell：遍歷字符串/文本文件

回答

相關問題