2013-11-24 25 views
0

我編寫了一個代碼,用於搜索HTML代碼並查找其中的鏈接。 HTML代碼中的行有一些不必要的字符,所以我需要刪除開始和結束。這是一條線的HTML代碼示例:提取字符串中間錯誤

{s:"Hate Being Sober", h:"../lyrics/chiefkeef/hatebeingsober.html", c:"", a:""} 

我的代碼貼在下面,這工作完全正常,直到我添加字符串bestUrl,在這種情況下,它給我的錯誤:

"Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at Java.lang.string.substring(String.java:1904)
at CussCount.main(CussCount.java:32)

這是我的代碼:

import java.io.*; 
import java.net.*; 
public class CussCount{ 
public static void main(String args[]){ 
    try{ 
     String artist=args[0]; 
     String first=artist.substring(0,1); 
     Boolean inSongs=false; 
     String beginIndex= "h:\".."; 
     String endIndex="\", c:"; 
     int one=1; 
     URL discography = new URL("http://www.azlyrics.com/"+first+"/"+artist+".html"); 
     URLConnection xx = discography.openConnection(); 
     BufferedReader xy = new BufferedReader(new InputStreamReader(
        xx.getInputStream())); 
     String words = xy.readLine(); 
     while(words!=null){ 
      if(words.equals("var songlist = [")){ 
       inSongs=true; 
      } 
      if(words.equals("var res = '<br />';")){ 
       inSongs=false; 
       break; 
      } 
      if(inSongs==true){ 
       System.out.println(words); 
       int startIndex= words.indexOf(beginIndex,one); 
       System.out.println(startIndex+6); 
       int finishIndex= words.indexOf(endIndex,one); 
       System.out.println(finishIndex); 

       String bestUrl=words.substring(startIndex, finishIndex); 
       System.out.println(bestUrl); 
      } 

      words = xy.readLine(); 
     } 
     xy.close(); 
    }catch(IOException ioe){ 
     System.out.println(ioe.getMessage()); 
    } 

} 
} 

任何想法,將不勝感激,謝謝你了!

+0

我在薄主邏輯{字符串bestUrl = words.substring(的startIndex,finishIndex);},則需要檢查startIndex和finishIndex是否不等於-1。 –

+0

這些正則表達式錯誤'String beginIndex =「h:\」..「; String endIndex =」\「,c:」;'。他們沒有找到,這使得'word.indexOf()= -1;' –

+0

你能告訴我們一些樣本輸入和期望的輸出?,並且一些邊緣情況會很好。 – Bohemian

回答

0

由於您在設置inSongs = true後忘記閱讀下一行,因此您的數組超出了範圍。我在打印出歌曲列表的代碼塊中添加了一個額外的readline以及一個空檢查。當我使用eddievedder作爲main的輸入參數時,修改後的代碼完美運行。

改進型Code下面

import java.io.*; 
import java.net.*; 
public class CussCount{ 
    public static void main(String args[]){ 
     try{ 
      String artist=args[0]; 
      String first=artist.substring(0,1); 
      Boolean inSongs=false; 
      String beginIndex= "h:\".."; 
      String endIndex="\", c:"; 
      int one=1; 
      URL discography = new URL("http://www.azlyrics.com/"+first+"/"+artist+".html"); 
      URLConnection xx = discography.openConnection(); 
      BufferedReader xy = new BufferedReader(new InputStreamReader(
        xx.getInputStream())); 
      String words = xy.readLine(); 
      while(words!=null){ 
       if(words.equals("var songlist = [")){ 
        inSongs=true; 
        words = xy.readLine(); 
       } 
       if(words.equals("var res = '<br />';")){ 
        inSongs=false; 
        break; 
       } 
       if(inSongs==true && words!=null){ 
        System.out.println(words); 
        int startIndex= words.indexOf(beginIndex,one); 
        System.out.println(startIndex+6); 
        int finishIndex= words.indexOf(endIndex,one); 
        System.out.println(finishIndex); 

        String bestUrl=words.substring(startIndex, finishIndex); 
        System.out.println(bestUrl); 
       } 

       words = xy.readLine(); 
      } 
      xy.close(); 
     }catch(IOException ioe){ 
      System.out.println(ioe.getMessage()); 
     } 

    } 
}