正則表達式爲LaTeX變音符逃脫？

我正在寫一個Scala腳本，它從幾個來源獲取信息，包括一個BibTeX文件。使用jbibtex library解析文件。正則表達式爲LaTeX變音符逃脫？

我的中文提供源文件中包含的LaTeX風格逃逸非ASCII字符，如

筆者= {Fjeld，莫滕和SCH \「{一} R，西塞爾Guttormsen}

我試圖用簡單的更換，但失敗了，因爲我不能寫一個適當的正則表達式來匹配逃逸。

我能想出是

最好3210

但正則表達式引擎抱怨比賽。

java.util.regex.PatternSyntaxException：近指數非法重複2 \「{A}

據我所知，我應該逃避\和正則表達式{，但不"或}。不過，我嘗試添加在日益隨機的地方更多逃生的反斜槓:(但沒有成功。

任何想法如何搭配呢？

更新 A-Umlaut轉義解決方案非常簡單（謝謝你Keppil）。這是

replace("\"{a}", "ä")

但LaTeX的也有其他字符轉義，例如\{ss}爲ß。

Scala不會讓我在字符串中使用「{ss}」，所以我嘗試使用原始字符串「」「{ss}」「」。然後整個更換拆開。

object Converter { 

    def cleanLatexEscapes(rawString: String): String = { 
    val aumlauts = rawString.replace("\"{a}", "ä") 
    val oumlauts = aumlauts.replace("\"{o}", "ö") 
    val uumlauts = oumlauts.replace("\"{u}", "ü") 
    val scharfesEs = uumlauts.replace("""\{ss}""", "ß") 

    return scharfesEs 
    } 

} 

import org.scalatest._ 

class ConverterSpec extends FlatSpec { 
    "cleanLatexEscapes" should "clean 'Käseklöße in der Küche'" in { 
    val escaped = """K\"{a}sekl\"{o}\{ss}e in der K\"{u}che""" 
     val cleaned = Converter.cleanLatexEscapes(escaped) 
     assert(cleaned === "Käseklöße in der Küche") 
    } 
}

cleanLatexEscapes - 應該清理 'Käseklöße在德庫車' *失敗* 「K [\äsekl\奧塞在德K]烏切」不等於「K [äseklöße在德K] üche「

這裏發生了什麼，我該如何解決這個問題，這樣才能涵蓋變音器和scharfes es逃生？另外，方括號在測試輸出中來自哪裏？

來源

2013-11-15 rumtscho

無需正則表達式在這裏，你可以使用replace()，而不是replaceAll()：

val author = "author = {Fjeld, Morten and Sch\"{a}r, Sissel Guttormsen}" 
println(author.replace("\"{a}", "ä"))

如果你真的想使用replaceAll()，你需要的逃生{和}：

val author = "author = {Fjeld, Morten and Sch\"{a}r, Sissel Guttormsen}" 
println(author.replaceAll("\"\\{a\\}", "ä"))

編輯

文字\以與"相同的方式轉義，即使用另一個反斜槓。要清潔你在上面描述的所有序列，您可以使用：

val cleaned = escaped.replace("\"{a}", "ä").replace("\"{o}", "ö").replace("\"{u}", "ü").replace("\\{ss}", "ß");

來源

2013-11-15 09:49:35 Keppil

對不起，我不得不刪除接受的標記，因爲這不適用於所有逃脫，只適用於元音變音。也許你可以擴展答案來覆蓋這一切？我在問題中發佈了更多信息。我對Scala仍然很陌生，並不確定整個逃生機制的工作原理。 – rumtscho

的內容替換應改爲：

object Converter { 

    def cleanLatexEscapes(rawString: String): String = { 
    val aumlauts = rawString.replace("\\\"{a}", "ä") 
    val oumlauts = aumlauts.replace("\\\"{o}", "ö") 
    val uumlauts = oumlauts.replace("\\\"{u}", "ü") 
    val scharfesEs = uumlauts.replace("\\{ss}", "ß") 

    return scharfesEs 
    } 

}

來源

2013-11-15 11:06:45 barnybug

的JBibTeX庫提供的LaTeX解析器（LaTeX的字符串轉換爲LaTeX命令的列表）和LaTeX漂亮的打印機（將LaTeX命令列表轉換爲Java unicode字符串）類。所以，這裏沒有必要混淆正則表達式。

README file包含一個工作代碼示例。

來源

2014-05-18 09:44:34 user1808924

正則表達式爲LaTeX變音符逃脫？

回答

相關問題