搜索和在HTML代碼

</div><div class="tr"> 
    </div><div class="bl"> 
    </div><div class="br"> 
    </div> <img src="http://blablabla.com/medium/blablabla.jpg" /> 
</div></a> 
      </div><div class="meta"> 
<h3 class="action"> 
<span> 
    <a href="/abc">ABC</a> 
    </span> a picture 
</h3>

我節省specifing鏈接保存網站的HTML源代碼轉換成字符串如下：搜索和在HTML代碼

public static BufferedReader read(String url) throws Exception 
    { 
    return new BufferedReader(
     new InputStreamReader(
      new URL(url).openStream())); 
    }

在這段代碼我要保存所有圖片的URL在一個新的String王氏concating \ n其中/medium/裏面或使字符串中的所有圖像鏈接變得更容易與\ n。過程應該如何？在此先感謝

來源

2012-12-28 Mustafa

我會使用正則表達式來查找網址。 – MrSmith42

您可以使用JSoup獲取圖像標記並執行一個簡單的String.contains來獲取您正在查找的圖像標記，而不是嘗試自己解析HTML內容。

Document doc = Jsoup.connect("http://www.blah.com/foo.html"); 
for (Element e : doc.select("img")) { 
    String imageSrc = e.attr("src"); 
    if (imageSrc.contains("/medium/")) { 
    ... 
    } 
}

還avoid using regex to parse HTML。

來源

2012-12-28 22:40:55 Reimeus

應該爲「Document doc = ..」導入哪一個：org.w3c.dom or org.jsoup.nodes？我想我必須註釋Jsoup.conne ..到（文檔） – Mustafa

[org.jsoup.nodes.Document]（http://jsoup.org/apidocs/org/jsoup/nodes/Document.html）... – Reimeus

搜索和在HTML代碼

回答

相關問題