2014-10-08 237 views
0

我想要做這樣的事情! 因此,我只剩下字符串的網站部分。我在字符串中的報價有問題。解析字符串 - Http字符串

 /////////////////////This is what i read into a string. 

      ///<td width="118"><a href="research.html" class="navText style10 style12"> 

    ///////I wanna be able to parse this so i am only left with research.html 

    //I sometimes also get a string that contains: 

    //<a href="http://www.ucalgary.ca" class="style18"><font size="3">University of Calgary</font></a></div> 

    //From this string i wanna keep http://www.ucalgary.ca 

到目前爲止我所得到的並不總是適用於每一種情況。我會感謝您的幫助!我的代碼是

 public class Parse 
     { 
      public static void main(String[] args) 
      { 
      String h = "<a href=\"http://www.departmentofmedicine.com/policy.htm\">"; 
      int n = getIndexOf(h, '"', 0); 


      String[] a = h.substring(n).split(">"); 
      String url = a[0].replaceAll("\"", ""); 
      //String value = a[1].replaceAll("</a", ""); 

      System.out.println(url + " "); 
      } 

      public static int getIndexOf(String str, char c, int n) 
      { 
      int pos = str.indexOf(c, 0); 
      while (n-- > 0 && pos != -1) 
      { 
       pos = str.indexOf(c, pos + 1); 
      } 
      return pos; 
      } 
     } 
+0

看看Java字符串的方法。他們已經剝離和這樣 – jgr208 2014-10-08 14:51:28

+0

目前尚不清楚,從你的輸入,「", what do you want to keep/extract ? – ToYonos 2014-10-08 14:53:44

+0

only departmentofmedicine.com/policy.htm /// This input works but the other inputs i mentioned above dont seem to work!! For example if i use this as input///// University of Calgary chillax786 2014-10-08 14:58:52

回答

0

我會給Pattern和Matcher這樣的嘗試:

String s = "<a href=\"http://www.departmentofmedicine.com/policy.htm\">"; 

    Pattern p = Pattern.compile(".*href=\"([^\"]*).*"); 
    Matcher m = p.matcher(s); 
    if(m.matches()) { 
     System.out.println(m.group(1)); 
    } 
0

小碼:

字符串H =「http://www.departmentofmedicine.com/policy .htm \「>」;
String url = h.substring(h.indexOf(「http」))。replace(「\」>「,」「);
System.out.println(url);

輸出將是: http://www.departmentofmedicine.com/policy.htm

測試我的機器上。

另外發布什麼是可能的情況。這樣我可以告訴你更好的解決方案。

解決方案的所有三個posibilities:

 //String h1 = "<a href=\"http://www.departmentofmedicine.com/policy.htm\">"; 
     //String h1 = `"<a href=\"ucalgary.ca\"; class=\"style18\"><font size=\"3\">University of Calgary</font></a>"; 
    String h1="<td width=\"118\"><a href=\"research.html\" class=\"navText style10 style12\">";` 

String url = h1.substring(h1.indexOf("href=\"") + "href=\"".length()).substring(0, h1.substring(h1.indexOf("href=\"") + "href=\"".length()).indexOf("\"")); 

System.out.println(url); 

取消註釋字符串H1;逐個對象並檢查你的要求。

上面的代碼是給輸出:
research.html
http://www.departmentofmedicine.com/policy.htm
ucalgary.ca

+0

輸出將是: – 2014-10-08 15:12:12

+0

這是另一種情況: chillax786 2014-10-08 15:20:02

+0

this is also another case: University of Calgary chillax786 2014-10-08 15:20:42