中文提供複雜的項目,我嘗試使用正則表達式來解析中文提供文件,的Javascript我似乎不能找到一個妥善的解決辦法。在以下示例中,bj
是一個包含書目項目的子項的數組。我不得不寫一個相當長的正則表達式來考慮值可以分成多行的項目,缺少大括號({}
)或者句法上有錯誤逗號(例如,最後一個字段不應以逗號結尾,但某些TeX編輯器不要抱怨)。如何解析JavaScript和正則表達式
這是我使用來測試我的正則表達式是什麼:
@inproceedings{Carrel2005,
title = {{Algorithm} for near-optimal autonomous resource management},
author = {Carrel, Ândrew and Palmer, Phil},
notes = nonote ,
booktitle = {8th International Symposium on Artificial {Intelligence,
Robotics}, and Automation in Space},
year = {2005}
blahblah = error,
}
正如你可以看到,一些值在分成兩行,可以有內部花括號。我一直在試圖改善正則表達式如下:
var txt = "@inproceedings{Carrel2005, \n" +
" title = {{Algorithm} for near-optimal autonomous resource management}, \n" +
" author = {Carrel, Ândrew and Palmer, Phil}, \n" +
" notes = nonote ,\n" +
" booktitle = {8th International Symposium on Artificial Intelligence, \n" +
" Robotics and Automation in Space}, \n" +
" year = {2005} \n" +
" blahblah = error,\n}";
bj = txt.match(/\w*[\t ]*=[\t ]*(\{[\u0020-\u0080\u00A1-\u00FF\u0300-\u036F\t\r\n]*?}|[a-zA-Z0-9]+)[\t ]*(,(?!\s*}))?/g);
解釋:
\w* A word for the field name.
[\t ]*=[\t ]* Any number of spaces or tabs after and before the equal sign.
( Start of group 1.
\{ Option 11: starts by an opening curly brace.
[ Start of character class AAA.
unicode-set Letters (basic Latin plus some extensions)
\t\r\n ... or whitespace.
]*? End of character class AAA (with LAZY repetition)
| End of option 11, start of option 12:
[a-zA-Z0-9]+ One or more characters (no underscore or whitespace allowed).
) End of option 12 and group 1.
[\t ]* Any number of tabs or spaces.
( Start of group 2:
, A literal comma
(?!\s*}) ...if it is not followed by whitespace and closing curly braces.
)? End of group 2. ? denotes it is optional.
我一直沒能匹配由多個花括號開始字段(如{{Algorithm} for near...
)也不正確匹配那些在內部找到序列},
的地方。
嘗試用* RegEx *編寫*解析器*總是讓我想起這個答案:http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 你爲什麼不寫一個合適的解析器? – Marc
使用正則表達式可靠地匹配大括號'{}'基本上是不可能的。問題在於,沒有辦法存儲您看過多少個開放的大括號,因此無法知道何時完成。您將需要編寫(或使用)適當的解析。這可能不會太困難,正則表達式的幫助 –