使用正則表達式在html中獲取標記代碼

-1

我希望在html文檔中獲取下面的每個標記（包含在<>內的每個標記代碼）。我已經嘗試過/<.+>/，但它似乎不起作用。使用正則表達式在html中獲取標記代碼

<table class="body wrap" cellpadding="0" cellspacing="0" align="center" style="width: 100%;max-width: 600px;background-color: #f4f4f4;">

我該怎麼做？

來源

2016-07-12 zonyang

你是什麼意思，像下面的每個標籤？標籤的哪部分應該包含在匹配的內容中？ – 10100111001

得到整個

標記（在這種情況下）和大型html文檔中的所有其他標記。 – zonyang

嘗試'/ <[^<>] +> /'或更好'/ <.+?> /' – horcrux

回答

這應該工作。

import java.util.regex.Pattern; 
import java.util.regex.Matcher; 
public class HTMLTagMatcher 
{ 
    private static final String REGEX = "<[^\\/][^<>]+>"; 
    private static final String INPUT = "<test><blah /><test2></test><best><blargh></best><outside>"; 

    public static void main(String[] args) { 
    Pattern p = Pattern.compile(REGEX); 
    Matcher match = p.matcher(INPUT); 
    while (match.find()) { 
     System.out.println(match.group()); 
    } 
    } 
}

來源

2016-07-12 18:08:30 10100111001

相關問題