如何在正則表達式和解析器組合器中限制nestead標記？

我想實現一個簡單的Wiki類標記分析器作爲使用Scala分析器組合器的練習。如何在正則表達式和解析器組合器中限制nestead標記？

我想解決這個問題，所以這裏是我想在第一個版本中實現的：一個簡單的內聯文字標記。

例如，如果輸入字符串爲：

This is a sytax test ``code here`` . Hello ``World``

輸出字符串應該是：

This is a sytax test <code>code here</code> . Hello <code>World</code>

我嘗試使用RegexParsers來解決這個問題，這裏是我做了什麼現在：

import scala.util.parsing.combinator._ 
import scala.util.parsing.input._ 

object TestParser extends RegexParsers 
{ 
    override val skipWhitespace = false 

    def toHTML(s: String) = "<code>" + s.drop(2).dropRight(2) + "</code>" 

    val words = """(.)""".r 
    val literal = """\B``(.)*``\B""".r ^^ toHTML 

    val markup = (literal | words)* 

    def run(s: String) = parseAll(markup, s) match { 
     case Success(xs, next) => xs.mkString 
     case _ => "fail" 
    } 
} 

println (TestParser.run("This is a sytax test ``code here`` . Hello ``World``"))

在這段代碼中，一個簡單的輸入只包含一個<code>標記正常工作，例如：

This is a sytax test ``code here``.

成爲

This is a sytax test <code>code here</code>.

但是，當我用上面的例子中運行它，它會產生

This is a sytax test <code>code here`` . Hello ``World</code>

我想這是因爲正則表達式我用途：

"""\B``(.)*``\B""".r

允許``對中的任何字符。

我想知道我應該限制那裏不能嵌套``並解決這個問題？

來源

2011-12-04 Brian Hsu

下面是關於非貪婪匹配一些文檔：

http://www.exampledepot.com/egs/java.util.regex/Greedy.html

基本上它開始在第一個`並儘可能地進行匹配，這與世界末日的匹配相符。

通過放置一個？在*之後，你告訴它做可能的最短匹配，而不是最長的匹配。

另一種選擇是使用[^`] *（除了``以外的任何東西`），並且這將強制它停止更早。

來源

2011-12-04 03:53:47 xaxxon

一些試驗和錯誤之後，我發現下面的正則表達式似乎工作：

"""``(.)*?``"""

來源

2011-12-04 02:49:47

我不知道很多關於正則表達式解析器，但你可以用一個簡單的1班輪：

def addTags(s: String) = 
    """(``.*?``)""".r replaceAllIn (
        s, m => "<code>" + m.group(0).replace("``", "") + "</code>")

測試：

scala> addTags("This is a sytax test ``code here`` . Hello ``World``") 
res0: String = This is a sytax test <code>code here</code> . Hello <code>World</code>

來源

2011-12-04 04:22:30

如何在正則表達式和解析器組合器中限制nestead標記？

回答

相關問題