如何篩選Groovy中的url文本

我想過濾groovy中的url內容，但絕不會在google中找到任何指南或使用示例，因此非常感謝所有幫助。如何篩選Groovy中的url文本

什麼是我想要做的是

<table class="smthng">

標籤中獲取的信息在以下網址

def resultText = "http://weather.am".toURL().text

我迄今試圖是要找到那麼所需的起跑線處理每直到達到結束標記，但我非常確定Groovy應該有很好的處理它的東西。例如，在groovy中，我可以使用find閉包來查找只有起始標籤，例如

if(it =~ "desired starting tag){ 
    then keep the value in list 
}

但我不知道如何做到這一點在整個表格塊在groovy優雅的方式。我聽說過第三方庫，如dom4j，nekohtml等，但我確信Groovy本身可以處理這個問題。在此先感謝

來源

2014-01-07 Edgar

你可以做到這一點（但你需要使用第三方的lib能夠解析HTML作爲XML）如果您想從該表中獲取地點名稱

// The 3rd party bit 
@Grab('org.ccil.cowan.tagsoup:tagsoup:1.2.1') 
import groovy.xml.* 

def parser = new org.ccil.cowan.tagsoup.Parser() 

// Needed for the serialize method below to avoid html:table prefixed names 
parser.setFeature("http://xml.org/sax/features/namespaces", false) 

def tab = new XmlSlurper(parser).parse('http://weather.am')   // Parse the HTML 
            .'**'         // Search the tree 
            .findAll { it.name() == 'table' }  // Find tables 
            .findAll { [email protected] == 'pop_up_list' } // With this class 
            .find()         // Just return the first one 

// Then print out what we found 
println XmlUtil.serialize(tab)

（如一個例子），你可以這樣做：

// For each `tr` in the table, collect the first `td` text 
def locations = tab.tr.collect { it.td[ 0 ].text() } 
// Print out the items 
locations.each { println it }

，它將打印：

Երևան 
Շիրակ 
Կոտայք 
Գեղարքունիք 
Լոռի 
Տավուշ 
Արագածոտնի լեռներ 
Արագածոտնի նախալեռներ 
Արարատ 
Արմավիր 
Վայոց ձորի լեռներ 
Վայոց ձորի նախալեռներ 
Սյունիքի հովիտներ 
Սյունիքի նախալեռն 
Արցախ 
Ջավախք

來源

2014-01-07 14:15:55

很漂亮，謝謝。 – Edgar

如何篩選Groovy中的url文本

回答

相關問題