2013-06-03 92 views
0

我有一個字符串,它看起來像這樣Java的正則表達式提取字符串兩個詞

<br/><description>Using a combination of remote probes, (TCP/IP, SMB, HTTP, NTP, SNMP, etc...) it is possible to guess the name of the remote operating system in use, and sometimes its version.</description><br/><fname>os_fingerprint.nasl</fname><br/><plugin_modification_date>2012/12/01</plugin_modification_date><br/><plugin_name>OS Identification</plugin_name><br/><plugin_publication_date>2003/12/09</plugin_publication_date><br/><plugin_type>combined</plugin_type><br/><risk_factor>None</risk_factor><br/><solution>n/a</solution><br/><synopsis>It is possible to guess the remote operating system.</synopsis><br/><plugin_output><br/>Remote operating system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>Confidence Level : 99<br/>Method : MSRPC<br/><br/> <br/>The remote host is running Microsoft Windows Server 2008 R2 Enterprise Service Pack 1</plugin_output><br/> 

我想提取「遠程操作系統:」之間並獲得「微軟的Windows Server 2008 R2企業版Service Pack 1的」 。

Remote operating system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/> 

所以我製作了使用

Pattern pattern = Pattern.compile("(?<=\\bRemote operating system :\\b).*?(?=\\b<br/>\\b)"); 

正則表達式,但我的正則表達式似乎並不奏效。任何想法?此外,這是一個很好的方式來提取這個操作系統字符串,或者我應該做另一種方式?謝謝!

回答

2

嘗試這種模式:".*Remote operating system : (.*?)<br/>"

public static void main(String[] args) throws Exception { 
    String s = "<br/><description>Using a combination of remote probes, (TCP/IP, SMB, HTTP, NTP, SNMP, etc...) it is possible to guess the name of the remote operating system in use, and sometimes its version.</description><br/><fname>os_fingerprint.nasl</fname><br/><plugin_modification_date>2012/12/01</plugin_modification_date><br/><plugin_name>OS Identification</plugin_name><br/><plugin_publication_date>2003/12/09</plugin_publication_date><br/><plugin_type>combined</plugin_type><br/><risk_factor>None</risk_factor><br/><solution>n/a</solution><br/><synopsis>It is possible to guess the remote operating system.</synopsis><br/><plugin_output><br/>Remote operating system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>Confidence Level : 99<br/>Method : MSRPC<br/><br/> <br/>The remote host is running Microsoft Windows Server 2008 R2 Enterprise Service Pack 1</plugin_output><br/>"; 

    Pattern pattern = Pattern.compile(".*Remote operating system : (.*?)<br/>"); 
    Matcher m = pattern.matcher(s); 
    if (m.find()) { 
     System.out.println(m.group(1)); 
    } 
    else System.out.println("Not found"); 
} 
0

。在你的正則表達式:後沒有空間和\\b之前。

試試這個方法:

Pattern.compile("(?<=\\bRemote operating system : \\b).*?(?=\\b<br/>\\b)"); 
//            ^additional space 

沒有新字(微軟)(它也永遠不會匹配字的結束,因爲:不能正確單詞的末尾)的那個空間\\b不會比賽開始。

0
String test = 
     "<br/><description>Using a combination of remote probes, " + 
     "(TCP/IP, SMB, HTTP, NTP, SNMP, etc...) it is possible to guess " + 
     "the name of the remote operating system in use, and sometimes " + 
     "its version.</description><br/><fname>os_fingerprint.nasl</fname>" + 
     "<br/><plugin_modification_date>2012/12/01</plugin_modification_date>" + 
     "<br/><plugin_name>OS Identification</plugin_name><br/>" + 
     "<plugin_publication_date>2003/12/09</plugin_publication_date><br/>" + 
     "<plugin_type>combined</plugin_type><br/><risk_factor>None</risk_factor>" + 
     "<br/><solution>n/a</solution><br/><synopsis>It is possible to guess the " + 
     "remote operating system.</synopsis><br/><plugin_output><br/>Remote operating " + 
     "system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>" + 
     "Confidence Level : 99<br/>Method : MSRPC<br/><br/> <br/>The remote host is " + 
     "running Microsoft Windows Server 2008 R2 Enterprise Service Pack 1" + 
     "</plugin_output><br/>"; 
     Pattern pattern = Pattern.compile("Remote\\soperating\\ssystem\\s:\\s(.+?)\\<br/>"); 
     Matcher matcher = pattern.matcher(test); 
     if (matcher.find()) { 
      System.out.println(matcher.group(1)); 
     } 

輸出:

Microsoft Windows Server 2008 R2 Enterprise Service Pack 1 

注意,在一般情況下,使用針對標記語言的正則表達式是不明智的。 但是,在這裏,您正在使用正則表達式來處理特定的文本字符串,這隻會發生在標記內部,所以我猜測它沒關係。

0

嘗試了下:

if (str.matches("^.*Remote operating system : ([^<]*).*$")) { 
    System.out.println(
     str.replaceAll("^.*Remote operating system : ([^<]*).*$", "$1") 
    ); 
}