使用正則表達式來修剪html

我需要一個正則表達式來剝去在下面的例子展示了html標籤之間的換行符，空格和製表符：

來源：

<html> 
    <head> 
    <title> 
      Some title 
     </title> 
    </head> 
</html>

求購結果：

<html><head><title>Some title</title></head></html>

的在「Some title」之前對空格進行修剪是可選的。我會很感激的任何幫助

來源

2009-06-02 Tim Skauge

你怎麼知道要刪除的空白空間？你爲什麼要移除*「某個標題」周圍的空白區域，而不是*在*它？你在這裏有什麼規矩？ – 2009-06-02 17:56:23

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – 2010-06-17 06:29:29

s/\s*(<[^>]+>)\s*/\1/gs

，或者在C＃：

Regex.Replace(html, "\s*(<[^>]+>)\s*", "$1", RegexOptions.SingleLine);

來源

2009-06-02 17:58:30

試試這個：

s/[^\w\/\d<>]+/gs

來源

2009-06-02 17:56:13 user105033

s/>\s+</></gs

來源

2009-06-02 17:58:02

如果HTML是嚴格的，與XML閱讀器加載它，並把它寫回沒有格式。這將保留標籤內的空白，但不在它們之間。

來源

2009-06-02 17:58:20 Welbog

更不用說它不會重新發明輪子。 – Pesto 2009-06-02 18:00:46

這將移除標記和標籤和文本之間的空間之間的空白。

s/(\s*(<))|((>)\s*)/\2\4/g

來源

2009-06-02 19:18:46

\ d在Perl 5.8和5.10中只匹配[0-9];它匹配任何具有數字屬性（包括「\ x {1815}」和「\ x {FF15}」）的UNICODE字符。如果您的意思是[0-9]，則必須使用[0-9]或使用字節雜注（但它將以1字節字符轉換所有字符串，通常不是您想要的）。

正則表達式在解析HTML時存在根本性問題（請參閱Can you provide some examples of why it is hard to parse XML and HTML with a regex?）。你需要的是一個HTML解析器。有關使用各種解析器的示例，請參閱Can you provide an example of parsing HTML with your favorite parser?。

您可能會感興趣HTMLAgilityPack answer。

來源

2009-06-02 21:53:03

-1

我想保留新的行，因爲刪除換行符已經搞亂了我的html。所以我跟着下面去了。。

private static string ProcessHTMLFile(string input) 
{ 
    string opt = Regex.Replace(input, @"()*", "", RegexOptions.Singleline); 
    opt = Regex.Replace(opt, @"[\t]*", "", RegexOptions.Singleline); 
    return opt; 
}

來源

2010-06-14 05:00:27 Shash

Regex.Replace(input, "<[^>]*>", String.Empty);

來源

2010-06-17 06:18:47 dankyy1

與XSLT一種解決方案是這樣的：

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">  
<xsl:output method="xml" encoding="UTF-8" indent="no"/> 

<xsl:template match="*|@*"> 
    <xsl:copy> 
     <xsl:apply-templates/> 
    </xsl:copy> 
</xsl:template> 

<!-- trim whitespaces from the content --> 
<xsl:template match="text()"> 
    <!-- remove from tag to content --> 
    <xsl:variable name="trimmedHead" select="replace(.,'^\s+','')"/> 
    <xsl:variable name="trimmed" select="replace($trimmedHead,'\s+$','')"/> 
    <xsl:value-of select="$trimmed"/> 
</xsl:template> 

<!-- do not trim where text content exist --> 
<xsl:template match="text()"> 
    <xsl:if test="not(matches(.,'^\s+$'))"> 
     <xsl:value-of select="."/> 
    </xsl:if> 
</xsl:template>

你可以選擇你想使用的模板。當內容存在時，第一個刪除所有空格，第二個只在空白或換行時才刪除。

來源

2012-08-29 19:07:50 FiveO

使用正則表達式來修剪html

回答

相關問題