我試圖從LaTeX源文件中用java提取定理。我的代碼幾乎可行,但一個測試用例失敗 - 嵌套定理。如何在LaTeX中使用正則表達式查找嵌套標記
@Test
public void testNestedTheorems() {
String source = "\\begin{theorem}" +
"this is the outer theorem" +
"\\begin{theorem}" +
"this is the inner theorem" +
"\\end{theorem}" +
"\\end{theorem}";
LatexTheoremProofExtractor extractor = new LatexTheoremProofExtractor(source);
extractor.parse();
ArrayList<String> theorems = extractor.getTheorems();
assertNotNull(theorems);
assertEquals(2, theorems.size()); // theorems.size() is 1
assertEquals("this is the outer theorem", theorems.get(0));
assertEquals("this is the inner theorem", theorems.get(1));
}
這裏是我的定理提取器,其被LatexTheoremProofExtractor#parse
稱爲:
private void extractTheorems() {
// If this has been called before, return
if(theorems != null) {
return;
}
theorems = new ArrayList<String>();
final Matcher matcher = THEOREM_REGEX.matcher(source);
// Add trimmed matches while you can find them
while (matcher.find()) {
theorems.add(matcher.group(1).trim());
}
}
和THEOREM_REGEX
定義如下:
private static final Pattern THEOREM_REGEX = Pattern.compile(Pattern.quote("\\begin{theorem}")
+ "(.+?)" + Pattern.quote("\\end{theorem}"));
沒有人有建議處理嵌套的標籤?
隨着一個正則表達式?你不能。 – immibis