刪除HTML標籤

我想使用此代碼從文檔中刪除除<a><img>和所有的HTML標籤：刪除HTML標籤

var regex = "<(?!a)(?!img)(?!iframe)([\s\S]*?)>"; 
var temp; 
while (source.match(regex)) { 
    temp = source.match(regex)[0]; 
    source = source.replace(temp, ""); 
} 
return source;

它可以在網上正則表達式測試儀，但由於某些原因，它不在我的網頁上工作。例如，當輸入爲時，它會返回原始字符串：

"<p class="MsoNormal" style="margin-left:202.5pt;line-height:200%;background:white"><b><span style="font-size: 16pt; line-height: 200%; color: rgb(131, 60, 11); background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">test</span></b><span style="font-size:16.0pt; 
line-height:200%;color:#833C0B;letter-spacing:-.15pt;mso-ansi-language:EN-US"><o:p></o:p></span></p>"

請幫忙！

來源

2015-01-04 levkaster

你能確切說出你想幹什麼？ –

是不是'[\ s \ S]'等同於'.'？你有沒有嘗試在你的'while'循環中添加'console.log（temp）'（或者設置一箇中斷點）來查看實際發生的事情？ – nnnnnn

[\ s \ S]允許正則表達式匹配多行 – levkaster

你可以不用正則表達式。嘗試使用正則表達式解析HTML通常不是一個好主意，除非用例非常簡單...

我實現的方式stripHtmlElementsMatching，您可以將它傳遞給任何CSS選擇器，它將剝離所有匹配的實體。

因此，要刪除除a, img, iframe以外的任何東西，您可以通過:not(a):not(img):not(iframe)。

PS：htmlstripping-root自定義標記僅用於避免創建干擾傳遞的選擇器的分析器元素。例如，如果我使用div作爲解析器元素，並且您將傳遞選擇器div > div，即使它們沒有嵌套在您的html字符串中，也將刪除所有div。

var stripHtmlElementsMatching = (function(doc) { 
 
    
 
    doc.registerElement('htmlstripping-root'); 
 
    
 
    return function(text, selector) { 
 
    
 
    var parser = document.createElement('htmlstripping-root'), 
 
     matchingEls, i, len, el; 
 
    
 
    selector = typeof selector == 'string' ? selector : ':not(*)'; 
 
    parser.innerHTML = text; 
 
    
 
    matchingEls = parser.querySelectorAll(selector); 
 
    
 
    for (i = 0, len = matchingEls.length; i < len; i++) { 
 
     el = matchingEls[i]; 
 
     el.parentNode.replaceChild(newFragFrom(el.childNodes), el); 
 
    } 
 
    
 
    return parser.innerHTML; 
 
    }; 
 
    
 
    function newFragFrom(nodes) { 
 
    var frag = document.createDocumentFragment(); 
 
    
 
    while (nodes.length) frag.appendChild(nodes[0]); 
 
    
 
    return frag; 
 
    } 
 
    
 
})(document); 
 

 

 
var text = '<p class="MsoNormal" style="margin-left:202.5pt;line-height:200%;background:white"><b><span style="font-size: 16pt; line-height: 200%; color: rgb(131, 60, 11); background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">test</span></b><span style="font-size:16.0pt; line-height:200%;color:#833C0B;letter-spacing:-.15pt;mso-ansi-language:EN-US"><o:p></o:p></span></p>'; 
 

 
var tagsToKeep = ['a', 'img', 'iframe']; 
 

 
var sanitizeSelector = tagsToKeep.map(function(tag) { 
 
    return ':not(' + tag + ')'; 
 
}).join(''); 
 

 
var sanitizedText = stripHtmlElementsMatching(text, sanitizeSelector); 
 

 
document.body.appendChild(document.createTextNode(sanitizedText));

來源

2015-01-04 02:14:59 plalx

這是最好的，我可以拿出！

<((?!a)|a\w)(?!\/a)(?!img)(?!iframe)(?!\/iframe)+([\s\S]*?)>

第一個捕獲組，不是一個或一個詞後面，允許音頻，縮寫，地址等全部通過。

只需將上述正則表達式中的匹配替換爲無。

請參閱：http://regexr.com/3a5hp

來源

2015-01-04 00:43:21 bitten

回答

相關問題