解析字符串

< A HREF =「http://www.google.com」>谷歌</A> < BR /> //無空格解析字符串

我試圖提取路段http://www.google.com以及文本谷歌

2013-11-21 user2809437

爲什麼你想自己解析它？有許多偉大的圖書館，如Jsoup，可以幫你照顧它。 – stevevls

@stevevls這是作業的要求。 – user2809437

您的教授是否堅持使用正則表達式來解析此HTML？ –

這應該做的工作。

String url = "<a href=\"http://www.google.com\">Google</a><br/>"; 
    String[] separate = url.split("\""); 
    String URL = separate[1]; 
    String text = separate[2].substring(1).split("<")[0];

來源

2013-11-21 01:28:39 Adarsh

你可以使用一個簡單的正則表達式來提取它。嘗試這個。

String s = "<a href=\"http://www.google.com\">Google</a><br/>"; 
Pattern pattern = Pattern.compile("<a\\s+href=\"([^\"]*)\">([^<]*)</a>"); 
Matcher matcher = pattern.matcher(s); 
if (matcher.find()) { 
    System.out.println(matcher.group(1)); 
    System.out.println(matcher.group(2)); 
}

來源

2013-11-21 01:30:03 akaya

我在我的網絡爬蟲中使用了過濾器API，它完美地工作。

下面是API代碼：

public static String filterHref(String hrefLine) 
{ 
    String link = hrefLine; 
    if (!link.toLowerCase().contains("href")) 
     return ""; 
    String[] hrefSplit = hrefLine.split("href"); // split href="..." alt="...">...<...> 

    link = hrefSplit[ 1 ].split("\\s+")[ 0 ]; // get href attribute and value 
    if (link.contains(">")) 
     link = link.substring(0, link.indexOf(">")); 
    link = link.replaceFirst("=", ""); 
    link = link.replace("\"", "").replace("'", "").trim(); 
    return link; 
}

來源

2013-11-21 01:39:10

回答

相關問題