2011-10-29 38 views
5

我正在做一種javascript代碼的粗略解析,使用javascript。我會盡量詳細說明爲什麼我需要這樣做,但足以說我不要想整合大量的庫代碼,因爲它對我的目的是沒有必要的,重要的是我保持這個非常輕巧,相對簡單。所以請不要建議我使用JsLint或類似的東西。如果答案比可以粘貼到答案中的代碼多,那可能比我想要的要多。在javascript代碼的字符串中查找正則表達式字面值

我的代碼目前能夠很好地檢測帶引號的部分和註釋,然後匹配大括號,括號和parens(當然,不要被引號和註釋混淆,或者在引號內轉義) 。這是我需要它做的,它做得很好......只有一個例外:

它可能會被正則表達式文字混淆。所以我希望能夠在JavaScript中檢測正則表達式字面值的一些幫助,所以我可以適當地處理它們。

事情是這樣的:

function getRegExpLiterals (stringOfJavascriptCode) { 
    var output = []; 
    // todo! 
    return output; 
} 

var jsString = "var regexp1 = /abcd/g, regexp1 = /efg/;" 
console.log (getRegExpLiterals (jsString)); 

// should print: 
// [{startIndex: 13, length: 7}, {startIndex: 32, length: 5}] 
+0

任何正則表達式文字開始位?如果你只是想要那些容易做到的事情。 – FailedDev

+0

我需要確定它是一個正則表達式,因此只需查找斜槓就不會這樣做。 – rob

回答

5

es5-lexer是使用一個非常準確的啓發,從分工表達區分JS代碼的正則表達式,也是一個JS詞法分析器提供了可以用它來做一個令牌平轉換確保生成的程序將由解析器完整的JS解析器以相同的方式進行解釋。

,其確定/是否開始一個正則表達式中guess_is_regexp.js並且測試在scanner_test.js line 401

var REGEXP_PRECEDER_TOKEN_RE = new RegExp(
    "^(?:" // Match the whole tokens below 
    + "break" 
    + "|case" 
    + "|continue" 
    + "|delete" 
    + "|do" 
    + "|else" 
    + "|finally" 
    + "|in" 
    + "|instanceof" 
    + "|return" 
    + "|throw" 
    + "|try" 
    + "|typeof" 
    + "|void" 
    // Binary operators which cannot be followed by a division operator. 
    + "|[+]" // Match + but not ++. += is handled below. 
    + "|-" // Match - but not --. -= is handled below. 
    + "|[.]" // Match . but not a number with a trailing decimal. 
    + "|[/]" // Match /, but not a regexp. /= is handled below. 
    + "|," // Second binary operand cannot start a division. 
    + "|[*]" // Ditto binary operand. 
    + ")$" 
    // Or match a token that ends with one of the characters below to match 
    // a variety of punctuation tokens. 
    // Some of the single char tokens could go above, but putting them below 
    // allows closure-compiler's regex optimizer to do a better job. 
    // The right column explains why the terminal character to the left can only 
    // precede a regexp. 
    + "|[" 
    + "!" // !   prefix operator operand cannot start with a division 
    + "%" // %   second binary operand cannot start with a division 
    + "&" // &, &&  ditto binary operand 
    + "(" // (   expression cannot start with a division 
    + ":" // :   property value, labelled statement, and operand of ?: 
      //    cannot start with a division 
    + ";" // ;   statement & for condition cannot start with division 
    + "<" // <, <<, << ditto binary operand 
    // !=, !==, %=, &&=, &=, *=, +=, -=, /=, <<=, <=, =, ==, ===, >=, >>=, >>>=, 
    // ^=, |=, ||= 
    // All are binary operands (assignment ops or comparisons) whose right 
    // operand cannot start with a division operator 
    + "=" 
    + ">" // >, >>, >>> ditto binary operand 
    + "?" // ?   expression in ?: cannot start with a division operator 
    + "[" // [   first array value & key expression cannot start with 
      //    a division 
    + "^" //^   ditto binary operand 
    + "{" // {   statement in block and object property key cannot start 
      //    with a division 
    + "|" // |, ||  ditto binary operand 
    + "}" // }   PROBLEMATIC: could be an object literal divided or 
      //    a block. More likely to be start of a statement after 
      //    a block which cannot start with a /. 
    + "~" // ~   ditto binary operand 
    + "]$" 
    // The exclusion of ++ and -- from the above is also problematic. 
    // Both are prefix and postfix operators. 
    // Given that there is rarely a good reason to increment a regular expression 
    // and good reason to have a post-increment operator as the left operand of 
    // a division (x++/y) this pattern treats ++ and -- as division preceders. 
); 
+0

感謝邁克,我很可能將來會用完整的詞法分析器,這是一件令人印象深刻的工作(正如您也寫過的,並且我已廣泛使用的美化劑) – rob

+0

@rob,不客氣。快樂樂興。 –

相關問題