我想要一個網頁,加載到一個字符串生成器,使用BufferedReader,然後使用正則表達式來查找和檢索單詞或在這種情況下,一組單詞(部門名稱,如計算機 - 科學,電氣工程等),匹配正則表達式模式。我使用的是java提供的Pattern和Matcher類,但是運行到非法狀態異常。我一直在盯着這段代碼很長一段時間,希望能對問題有一個全新的認識。我知道它與m.find()和m.group()方法有關。任何幫助將不勝感激。illegalStateException當使用Java匹配類
我會從輸出我說,它認識到匹配正則表達式的第一個單詞,並開始拋出illegalStateException之後。
我也貼出下面我的代碼:
public class Parser{
static StringBuilder theWebPage;
ArrayList<String> courseNames;
//ArrayList<parserObject> courseObjects;
public static void main(String[] args)
{
Parser p = new Parser();
theWebPage = new StringBuilder();
try {
URL theUrl = new URL("http://ocw.mit.edu/courses/");
BufferedReader reader = new BufferedReader(new InputStreamReader(theUrl.openStream()));
String str = null;
while((str = reader.readLine())!=null)
{
theWebPage.append(" ").append(str);
//System.out.println(theWebPage);
}
//System.out.println(theWebPage);
reader.close();
} catch (MalformedURLException e) {
System.out.println("MalformedURLException");
} catch (IOException e) {
System.out.println("IOException");
}
p.matchString();
}
public Parser()
{
//parserObject courseObject = new parserObject();
//courseObjects = new ArrayList<parserObject>();
courseNames = new ArrayList<String>();
//theWebPage=" ";
}
public void matchString()
{
String matchRegex = "#\\w+(-\\w+)+";
Pattern p = Pattern.compile(matchRegex);
Matcher m = p.matcher(theWebPage);
int i=0;
int x=0;
//m.reset();
while(!(m.matches()))
{
System.out.println("inside matches method " + i);
try{
m.find();
x = m.end();
System.out.println(m.group());
PrintStream out = new PrintStream(new FileOutputStream("/Users/xxxx/Desktop/output.txt"));
System.setOut(out);
//courseNames.add(i,m.group());
i++;
}catch(IllegalStateException e)
{
System.out.println("IllegalStateException");
} catch (FileNotFoundException e) {
System.out.println("FileNotFound Exception");
}
}
}
}
更好地解析網頁內容與http://jsoup.org/ – Reimeus 2012-08-11 14:41:30