1
我試圖從一個網站解析以下行:Java網站解析器
<div class="search-result__price">£2,995</div>
我只希望它的2995的一部分,但我有這樣做的難度。這是我的代碼;它目前能夠解析所有包含英鎊符號的行,並在網站上顯示所有貨幣。請幫忙! (!)
public class parser {
private static String string1 = "£";
private String testURL = "http://www.autotrader.co.uk/search/used/cars/bmw/1_series/postcode/tn126bg/radius/1500/onesearchad/used%2Cnearlynew%2Cnew/quicksearch/true/page/2";
private ArrayList<String> list = new ArrayList<String>();
private ArrayList<Integer> prices = new ArrayList<Integer>();
private int averagePrice;
private int start;
private int finish;
public parser() throws IOException {
URL url = new URL(testURL);
Scanner scan = new Scanner(url.openStream());
boolean alreadyHit = false;
while (scan.hasNext()) {
String line = scan.nextLine();
if (line.contains(string1)) {
list.add(line);
start = line.indexOf("£");
line = line.substring(start);
for (int i = 0; i < line.length(); i++) {
if (((line.charAt((i)) == ' ') || ((line.charAt((i)) == '<'))) && (alreadyHit == false)) {
finish = i;
alreadyHit = true;
}
}
alreadyHit = false;
line = line.substring(0, finish);
line = line.trim();
line = line.replace("£", "");
line = line.replace(",", "");
try {
int price = Integer.parseInt(line);
prices.add(price);
} catch (Exception e) {
}
}
}
}
public static void main(String args[]) throws IOException {
parser p = new parser();
for (Integer x : p.prices) {
System.out.println(x);
}
}
}
如果它是目前能夠解析所有網站中的行和顯示貨幣,有什麼問題?還是你的意思是「無法」?如果是這樣,它在做什麼? – RealSkeptic 2014-10-29 21:17:20
*** [不要使用REGEX指定XML/HTML。](http://stackoverflow.com/a/1732454/510036)*** – Qix 2014-10-29 21:22:35
1+對於@Qix剛纔所說的。使用REGEX解析非常規語言會導致瘋狂。 – 2014-10-29 21:23:45