2017-09-25 68 views
-3

我喜歡正則表達式。但是,我剛剛在瀏覽器中運行JavaScript RegExp時無法使用s標誌。我很好奇爲什麼這個標誌不包括在內?這將是非常有幫助的。爲什麼JavaScript RegExp缺少「s」標誌?

我見過有一個外部庫XRegExp,它啓用了這個s標誌(以及其他一些標誌),但我也很好奇爲什麼這些額外的(有用的)標誌在標準JavaScript中也不存在。我也不願意包括另一個外部庫...

這裏是一個例子,我試圖解決一個問題,檢測開放/關閉標籤的WordPress的短碼可能有新行內(或我必須插入換行符以改善檢測)。

// 
 
// Let's take some input text, e.g. WordPress shortcodes 
 
// 
 
var exampleText = '[buttongroup class="vertical"][button content="Example 1" class="btn-default"][/button][button class="btn-primary"]Example 2[/button][/buttongroup]' 
 

 
// 
 
// Now let's say I want to extract the shortcodes and its attributes 
 
// keeping in mind shortcodes can or cannot have closing tags too 
 
// 
 
// Shortcodes which have content between the open/closing tags can contain 
 
// newlines. One of the issues with the flags is that I can't use `s` to make 
 
// the dot character. 
 
// 
 
// When I run this on regex101.com they support the `s` flag (probably with the 
 
// XRegExp library) and everything seems to work well. However when running this 
 
// in the browser I get the "Uncaught SyntaxError: Invalid regular expression 
 
// flags" error. 
 
// 
 
var reGetButtons = /\[button(?:\s+([^\]]+))?\](?:(.*)\[\/button\])?/gims 
 
var reGetButtonGroups = /\[buttongroup(?:\s+([^\]]+))?\](?:(.*)\[\/buttongroup\])?/gims 
 

 
// 
 
// Some utility methods to extract attributes: 
 
// 
 

 
// Get an attribute's value 
 
// 
 
// @param string input 
 
// @param string attrName 
 
// @returns string 
 
function getAttrValue (input, attrName) { 
 
    var attrValue = new RegExp(attrName + '=\"([^\"]+)\"', 'g').exec(input) 
 
    return (attrValue ? window.decodeURIComponent(attrValue[1]) : '') 
 
} 
 

 
// Get all named shortcode attribute values as an object 
 
// 
 
// @param string input 
 
// @param array shortcodeAttrs 
 
// @returns object 
 
function getAttrsFromString (input, shortcodeAttrs) { 
 
    var output = {} 
 
    for (var index = 0; index < shortcodeAttrs.length; index++) { 
 
    output[shortcodeAttrs[index]] = getAttrValue(input, shortcodeAttrs[index]) 
 
    } 
 
    return output 
 
} 
 

 
// 
 
// Extract all the buttons and get all their attributes and values 
 
// 
 
function replaceButtonShortcodes (input) { 
 
    return input 
 
    // 
 
    // Need this to avoid some tomfoolery. 
 
    // By splitting into newlines I can better detect between open/closing tags, 
 
    // however it goes out the window when newlines are within the 
 
    // open/closing tags. 
 
    // 
 
    // It's possible my RegExps above need some adjustments, but I'm unsure how, 
 
    // or maybe I just need to replace newlines with a special character that I 
 
    // can then swap back with newlines... 
 
    // 
 
    .replace(/\]\[/g, ']\n[') 
 
    // Find and replace the [button] shortcodes 
 
    .replace(reGetButtons, function (all, attr, content) { 
 
     console.log('Detected [button] shortcode!') 
 
     console.log('-- Extracted shortcode components', { all: all, attr: attr, content: content }) 
 

 
     // Built the output button's HTML attributes 
 
     var attrs = getAttrsFromString(attr, ['class','content']) 
 
     console.log('-- Extracted attributes', { attrs: attrs }) 
 
     
 
     // Return the button's HTML 
 
     return '<button class="btn ' + (typeof attrs.class !== 'undefined' ? attrs.class : '') + '">' + (content ? content : attrs.content) + '</button>' 
 
    }) 
 
} 
 

 
// 
 
// Extract all the button groups like above 
 
// 
 
function replaceButtonGroupShortcodes (input) { 
 
    return input 
 
    // Same as above... 
 
    .replace(/\]\[/g, ']\n[') 
 
    // Find and replace the [buttongroup] shortcodes 
 
    .replace(reGetButtonGroups, function (all, attr, content) { 
 
     console.log('Detected [buttongroup] shortcode!') 
 
     console.log('-- Extracted shortcode components', { all: all, attr: attr, content: content }) 
 
     
 
     // Built the output button's HTML attributes 
 
     var attrs = getAttrsFromString(attr, ['class']) 
 
     console.log('-- Extracted attributes', { attrs: attrs }) 
 
     
 
     // Return the button group's HTML 
 
     return '<div class="btn-group ' + (typeof attrs.class !== 'undefined' ? attrs.class : '') + '">' + (typeof content !== 'undefined' ? content : '') + '</div>' 
 
    }) 
 
} 
 

 
// 
 
// Do all the extraction on our example text and set within the document's HTML 
 
// 
 
var outputText = replaceButtonShortcodes(exampleText) 
 
outputText = replaceButtonGroupShortcodes(outputText) 
 
document.write(outputText)

使用s標誌可以讓我輕鬆地做到這一點,但因爲它是不支持的,我不能利用該標誌的好處。

+1

警告任何人都想看看'regexp101.com':__Don' t__。它看起來像惡意軟件網站。 – Andy

+1

https://meta.stackoverflow.com/a/293819/47589 – Amy

+0

@安迪:而http://regex101.com(沒有p)不是(據我所知)和我自己和其他人已經使用它多年。 Matt,在你的代碼評論中,這只是一個錯字嗎? –

回答

3

沒有什麼大的邏輯,它只是沒有包括在內,就像其他環境中JavaScript所沒有的其他正則表達式特性一樣(到目前爲止)。

這是in the process of being added now。目前第3階段,所以也許E​​S2018,也許不是階段  4月爲2017年 如此將在ES2018,但賠率是高,你會看到正在向今年儘快新銳瀏覽器支持。

Look-behindunicode property escapes也都在卡...)


旁註:

當我regex101.com運行這個他們所支持的s標誌...

如果通過菜單將正則表達式類型設置爲JavaScript,則不適用。點擊菜單按鈕,在左上角:

enter image description here

...並改變 「味」 爲JavaScript:

enter image description here

你可能離開它的默認值,這是PCRE ,它確實支持s標誌。

他們用來使這個更明顯。因爲他們把它藏在一個菜單上,你不是遠程我見過的第一個人沒有設置正確的...

+0

感謝您的鏈接!我也設法使用'[^]'hack來解決我的問題。 –

+1

@Matt實際上'[^]'不是黑客,它不匹配任何東西,實際上任何東西。 –

+2

@MattScheurich你也可以使用'[\ s \ S]'',它也可以與其他正則表達式引擎一起工作。 –