javascript正則表達式幫助

我有我的客戶這個複雜的問題，我無法找到答案，所以現在我會試着問你們。javascript正則表達式幫助

的追求是：

我認爲，一個規則可能是：點其中號碼後立即出現，不作爲句子。這意味着「8.marts」和「2.567」中的句子不計爲個字點。作爲迴應，每個字點可能被忽略（如果現在句子以數字結尾：「Vi kommer k1.8」）但它可能總是不太經常。

另一個可能是：如果有一個字符（一個字母或數字）緊接着一個句子不是短語句子。這將使得我們避免計算在「f.eks」，「bl.a.」中存在的句子。和「cand.mag。」。

我希望我能在這裏得到幫助。

我的代碼：

<script> 
function word_count(field, count) { 

    var wordsNumberOverSeven = 0; 
    var wordsNumber = 0 

    var contentText = $(\'#lix_word_count\').val(); 
    contentText = contentText.replace(\'?\', \'.\'); 
    contentText = contentText.replace(\'!\', \'.\'); 
    contentText = contentText.replace(\',\', \'\'); 
    contentText = contentText.replace(\';\', \'\'); 
    contentText = contentText.replace(\':\', \'\'); 
    contentText = contentText.replace(\'\n\', \' \').replace(/^\s+|\s+$/g,\'\').replace(/\s\s+/g,\' \'); 

    var matchDots = contentText.split(\'.\').length-1; 
    var match = contentText.split(\' \'); 

    $.each(match, function(){ 
     if (this.length > 0) 
      wordsNumber += 1; 

     if (this.length >= 7) 
     { 
      wordsNumberOverSeven += 1; 
     } 

    }); 

    var lixMatWords = wordsNumber/matchDots; 
    var lixMatLongWords = (wordsNumberOverSeven * 100)/wordsNumber; 

    var lixMatch = Math.round((lixMatWords + lixMatLongWords) *100)/100; 
    var lixType = \'\'; 

    if (lixMatch <= 24) 
     lixType = \'Lixen i din tekst er \'+ lixMatch +\', dvs. at teksten er meget let at læse.\'; 
    else if (lixMatch <= 34) 
     lixType = \'Lixen i din tekst er \'+ lixMatch +\', dvs. at teksten er let at læse\'; 
    else if (lixMatch <= 44) 
     lixType = \'Lixen i din tekst er \'+ lixMatch +\', dvs. at teksten ligger i midterområdet.\'; 
    else if (lixMatch <= 54) 
     lixType = \'Lixen i din tekst er \'+ lixMatch +\', dvs. at teksten er svær at læse.\'; 
    else 
     lixType = \'Lixen i din tekst er \'+ lixMatch +\', dvs. at teksten er meget svær at læse.\'; 

    /** alert(lixType +\'\nDots: \'+ matchDots +\'\nWords: \'+ wordsNumber +\'\nLangeord: \'+ wordsNumberOverSeven); **/ 
    alert(lixType); 
} 
</script>

來源

2011-05-17 ParisNakitaKejser

請重新說明問題，以便清楚您需要什麼。另外，選擇更好的標題;不需要在標題中包含「javascript」和「regex」，因爲這些是標籤，這就足夠了。 – 2011-05-17 09:23:49

您需要重申您想要匹配的內容，而不是客戶認爲應該如何完成的內容。 – 2011-05-17 09:31:01

我認爲，我們需要看到的其餘規則，或幾個，至少。

也許最好是描述一下你想要包含什麼樣的句子，而不是要排除什麼。如果你正在尋找完整的句子，那麼它可能是一個以非空白字符開頭的句點，後面跟着一個空格或換行符或換行符，或者一些更復雜的規則集。它可能需要多個正則表達式和其他邏輯來排序更復雜的情況。

來源

2011-05-17 09:22:47 RobG

如果您想根據該規則拆分句子，然後像

mySentences.match(/(?:[^.0-9]|[0-9]+\.?|\.[a-z0-9])+(?:\.|$)/ig)

應該這樣做。

您必須展開a-z才能在您的語言中包含重音字符，但應該這樣做。

它爲您的輸入文本生成以下內容。

["I think that one rule might be: Dots which appears immediately after a number, not counted as sentences.", 
" This means that sentence present in the \"8. marts\"and \"2.567\" is not counted as word dots.", 
" In return, each word dots may be overlooked (if now a sentence ends with a number: \"Vi kommer kl.", 
" 8\") but it's probably after all not quite as often.", 
"\n\nAnother might be: If there is one character (a letter or number) immediately after a sentence is not a phrase sentence.", 
" That would make that we avoided counting the sentence present in the \"f.eks.", 
"\", \"bl.a.","\" and \"cand.mag.", 
"\"."]

所以顯然它有點出現在引用部分內的問題。只要句子在引用部分內結束，您就可以通過散步和重新加入來解決這個問題。

// Given mySentences defined above, walk counting quote characters. 
// You could modify the regexp below if your language tends to use 
// a different quoting style, e.g. French-style angle quotes. 
for (var i = 0; i < mySentences.length - 1; ++i) { 
    var quotes = mySentences[i].match(/["\u201c\u201d]/g); 
    // If there are an odd number of quotes, combine the next sentence 
    // into this one. 
    if (quotes && quotes.length % 2) { 
    // In English, it is common to end the quoted section after the 
    // closing punctuator: Say "hello." 
    var next = mySentences[i + 1]; 
    if (/^["\u201c\u201d]/.test(next)) { 
     mySentences[i] += next.substring(0, 1); 
     mySentences[i + 1] = next.substring(1); 
    } else { 
     mySentences[i] += next; 
     mySentences.splice(i, 1); 
     --i; // See if there's more to combine into this sentence. 
    } 
    } 
}

雖然這種東西很脆弱。如果你想知道專門研究這種事情的人是如何做的，可以搜索「自然語言分割」。

來源

2011-05-17 11:17:27

javascript正則表達式幫助

回答

相關問題