2010-06-29 33 views
2

我看了幾個關於如何在JS中使用正則表達式的例子,但我似乎無法找到我需要的正確語法。基本上,我有話數組:如何使用Javascript(JS)中的數組從字符串中提取單詞和短語?

commonWords=["she", "he", "him", "liked", "i", "a", "an", "are"] 

和一個字符串:

'She met him where he liked to eat "the best" cheese pizza.' 

基本上我想使用非阿爾法和我commonWords的陣列作爲分隔符,用於提取短語。以上會產生財產以後這樣的:

'met, where, to eat, the best, cheese pizza' 
+0

的反應應該是:''滿足,其中,吃,最好,奶酪pizza''。 「喜歡」位於commonWords列表中。 – 2010-06-29 10:59:45

+0

謝謝!如此真實。 – 2010-06-29 21:19:16

回答

1

從OP:

「基本上,我想用非阿爾法我commonWords的陣列作爲分隔符,用於提取短語。」

這樣做(不像其他一些答案;-))。它返回一個字符串或一個數組。


這將返回:

"met, where, to eat, the best, cheese pizza, didn't, Mr, O'Leary" 

and 

["met", "where", "to eat", "the best", "cheese pizza", "didn't", "Mr", "O'Leary"] 
2

您正在尋找這樣的事情:

var commonWords=["she", "he", "him", "liked", "i", "a", "an", "are"]; 
var regstr = "\\b(" + commonWords.join("|") + ")\\b"; 
//regex is \b(she|he|him|liked|i|a|an|are)\b 
var regex = new RegExp(regstr, "ig"); 
var str = 'She met him where he liked to eat "the best" cheese pizza.'; 
console.log(str.replace(regex, "")); 

輸出

met where to eat "the best" cheese pizza. 

split版本:

var commonWords=["she", "he", "him", "liked", "i", "a", "an", "are"]; 
var regstr = "\\b(?:" + commonWords.join("|") + ")\\b"; 
var regex = new RegExp(regstr, "ig"); 
var str = 'She met him where he liked to eat "the best" cheese pizza.'; 
var arr = str.split(regex); 
console.log(arr);// ["", " met ", " where ", " ", " to eat "the best" cheese pizza."] 

for(var i = 0; i < arr.length; i++) 
    if(arr[i].match(/^\s*$/)) //remove empty strings and strings with only spaces. 
    arr.splice(i--, 1); 
    else 
    arr[i] = arr[i].replace(/^\s+|\s+$/g, ""); //trim spaces from beginning and end 

console.log(arr);// ["met", "where", "to eat "the best" cheese pizza."] 
console.log(arr.join(", "));// met, where, to eat "the best" cheese pizza. 
+1

不錯的一個。 OP希望'split'而不是'replace',但它足夠相似。 (即刪除捕獲組,也可能是空的標記) – Kobi 2010-06-29 08:09:11

0

這個版本相當冗長,但與「懶」單,雙引號,以及工作原理:

如果數組包含對象(如indexOfObject)與不區分大小寫的比較標誌:

if (!Array.prototype.containsObject) Array.prototype.containsObject = function (object, caseInsensitive) { 

    for (var i = 0; i < this.length; i++) { 

     if (this[i] == object) return true; 

     if (!(caseInsensitive && (typeof this[i] == 'string') && (typeof object == 'string'))) continue; 

     return (this[i].match(RegExp(object, "i")) != null); 

    } 

    return false; 

} 

推對象到陣列如果不爲空:

if (!Array.prototype.pushIfNotEmpty) Array.prototype.pushIfNotEmpty = function (object) { 

    if (typeof object == 'undefined') return; 
    if ((object && object.length) <= 0) return; 

    this.push(object); 

} 

串進行規範化:

function canonicalizeString (inString, whitespaceSpecifier) { 

    if (typeof inString != 'string') return ''; 
    if (typeof whitespaceSpecifier != 'string') return ''; 

    var whitespaceReplacement = whitespaceSpecifier + whitespaceSpecifier; 
    var canonicalString = inString.replace(whitespaceSpecifier, whitespaceReplacement); 

    var singleQuotedTokens = canonicalString.match(/'([^'s][^']*)'/ig); 
    for (tokenIndex in singleQuotedTokens) canonicalString = canonicalString.replace(singleQuotedTokens[tokenIndex], String(singleQuotedTokens[tokenIndex]).replace(" ", whitespaceReplacement)); 

    var doubleQuotedTokens = canonicalString.match(/"([^"]*)"/ig); 
    for (tokenIndex in doubleQuotedTokens) canonicalString = canonicalString.replace(doubleQuotedTokens[tokenIndex], String(doubleQuotedTokens[tokenIndex]).replace(" ", whitespaceReplacement)); 

    return canonicalString; 

} 

好玩:

function getSignificantTokensFromStringWithCommonWords (inString, inCommonWordsArray) { 

    if (typeof inString != 'string') return []; 
    if (typeof (inCommonWordsArray && inCommonWordsArray.length) != 'number') return []; 

    var canonicalString = canonicalizeString(inString, "_"); 

    var commonWords = []; 
    for (indexOfCommonWord in inCommonWordsArray) commonWords.pushIfNotEmpty(canonicalizeString(inCommonWordsArray[indexOfCommonWord], "_")); 

    var tokenizedStrings = canonicalString.split(" "); 

    for (indexOfToken in tokenizedStrings) 
    if (commonWords.containsObject(tokenizedStrings[indexOfToken], true)) 
    tokenizedStrings[indexOfToken] = undefined; 





    var responseObject = []; 
    for (indexOfToken in tokenizedStrings) 
    if (typeof tokenizedStrings[indexOfToken] == 'string') 
    responseObject.push(tokenizedStrings[indexOfToken]); 

    for (indexOfTokenInResponse in responseObject) 
    if (typeof responseObject[indexOfTokenInResponse] == 'string') 
    responseObject[indexOfTokenInResponse] = String(responseObject[indexOfTokenInResponse]).replace("__", " "); 

    return responseObject; 

} 
+0

您將調用'getSignificantTokensFromStringWithCommonWords(inString,inCommonWordsArray)'來處理這個片段。 – 2010-06-29 09:05:51

相關問題