2016-11-22 25 views
0

編程很新穎,我一直在教自己的Java,因爲我一直在。我目前試圖做的是在特定的yelp搜索中提取所有給定公司的名稱,並將結果存儲到數組中。這裏是我去的:如何使用jSoup從Yelp中檢索信息?

import java.util.ArrayList; 
import org.jsoup.Jsoup; 
import org.jsoup.nodes.Document; 
import org.jsoup.nodes.Element; 
import org.jsoup.select.Elements; 
import java.io.IOException; 

public class YelpScraper 
{ 
    public static void main(String[] args) throws IOException 
    { 
     String url = "https://www.yelp.com/search?find_desc=&find_loc=new+jersey&ns=1"; 
     Document document = Jsoup.connect(url).get(); 

     Elements elements = document.getElementsByClass("biz-name js-analytics-click"); 

     for (Element element : elements) 
     { 
      System.out.println(elements.toString()); 
     } 
    } 
} 

現在這是我的問題。這是輸出:

<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/darios-restaurant-newark" data-hovercard-id="resfu-JNLUKR3l82D5W7-A"><span>Dario’s Restaurant</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/sushi-house-21-newark-2" data-hovercard-id="vMpJRWxm71XSBnWL9XfYpQ"><span>Sushi House 21</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/burger-walla-newark" data-hovercard-id="JmPZ-AyewjQPIJkKbkU0dA"><span>Burger Walla</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hobbys-delicatessen-and-restaurant-newark" data-hovercard-id="-dEkFa3N6SXLahAMBAM8EA"><span>Hobby’s Delicatessen &amp; Restaurant</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/krugs-tavern-newark" data-hovercard-id="YhiUGWjAB1y7reqoKLWCow"><span>Krug’s Tavern</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/mcwhorter-barbecue-newark" data-hovercard-id="6xf4H2rOCtUIhyMgazRsnA"><span>McWhorter Barbecue</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/spanish-tavern-newark" data-hovercard-id="muXH1f3nwoSgWB3KN-rAfA"><span>Spanish Tavern</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/casa-d-paco-newark" data-hovercard-id="iIJ-dWgYcZTewVGJyP6EfQ"><span>Casa d’Paco</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hero-king-handcrafted-sandwiches-newark" data-hovercard-id="hzwE2ub1J7fTwJDjTJwksA"><span>Hero King Handcrafted Sandwiches</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/the-green-chicpea-newark-2" data-hovercard-id="bDWWtSm-8uoW9_urjMCzTA"><span>The Green Chicpea</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/darios-restaurant-newark" data-hovercard-id="resfu-JNLUKR3l82D5W7-A"><span>Dario’s Restaurant</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/sushi-house-21-newark-2" data-hovercard-id="vMpJRWxm71XSBnWL9XfYpQ"><span>Sushi House 21</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/burger-walla-newark" data-hovercard-id="JmPZ-AyewjQPIJkKbkU0dA"><span>Burger Walla</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hobbys-delicatessen-and-restaurant-newark" data-hovercard-id="-dEkFa3N6SXLahAMBAM8EA"><span>Hobby’s Delicatessen &amp; Restaurant</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/krugs-tavern-newark" data-hovercard-id="YhiUGWjAB1y7reqoKLWCow"><span>Krug’s Tavern</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/mcwhorter-barbecue-newark" data-hovercard-id="6xf4H2rOCtUIhyMgazRsnA"><span>McWhorter Barbecue</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/spanish-tavern-newark" data-hovercard-id="muXH1f3nwoSgWB3KN-rAfA"><span>Spanish Tavern</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/casa-d-paco-newark" data-hovercard-id="iIJ-dWgYcZTewVGJyP6EfQ"><span>Casa d’Paco</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hero-king-handcrafted-sandwiches-newark" data-hovercard-id="hzwE2ub1J7fTwJDjTJwksA"><span>Hero King Handcrafted Sandwiches</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/the-green-chicpea-newark-2" data-hovercard-id="bDWWtSm-8uoW9_urjMCzTA"><span>The Green Chicpea</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/darios-restaurant-newark" data-hovercard-id="resfu-JNLUKR3l82D5W7-A"><span>Dario’s Restaurant</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/sushi-house-21-newark-2" data-hovercard-id="vMpJRWxm71XSBnWL9XfYpQ"><span>Sushi House 21</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/burger-walla-newark" data-hovercard-id="JmPZ-AyewjQPIJkKbkU0dA"><span>Burger Walla</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hobbys-delicatessen-and-restaurant-newark" data-hovercard-id="-dEkFa3N6SXLahAMBAM8EA"><span>Hobby’s Delicatessen &amp; Restaurant</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/krugs-tavern-newark" data-hovercard-id="YhiUGWjAB1y7reqoKLWCow"><span>Krug’s Tavern</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/mcwhorter-barbecue-newark" data-hovercard-id="6xf4H2rOCtUIhyMgazRsnA"><span>McWhorter Barbecue</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/spanish-tavern-newark" data-hovercard-id="muXH1f3nwoSgWB3KN-rAfA"><span>Spanish Tavern</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/casa-d-paco-newark" data-hovercard-id="iIJ-dWgYcZTewVGJyP6EfQ"><span>Casa d’Paco</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hero-king-handcrafted-sandwiches-newark" data-hovercard-id="hzwE2ub1J7fTwJDjTJwksA"><span>Hero King Handcrafted Sandwiches</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/the-green-chicpea-newark-2" data-hovercard-id="bDWWtSm-8uoW9_urjMCzTA"><span>The Green Chicpea</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/darios-restaurant-newark" data-hovercard-id="resfu-JNLUKR3l82D5W7-A"><span>Dario’s Restaurant</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/sushi-house-21-newark-2" data-hovercard-id="vMpJRWxm71XSBnWL9XfYpQ"><span>Sushi House 21</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/burger-walla-newark" data-hovercard-id="JmPZ-AyewjQPIJkKbkU0dA"><span>Burger Walla</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hobbys-delicatessen-and-restaurant-newark" data-hovercard-id="-dEkFa3N6SXLahAMBAM8EA"><span>Hobby’s Delicatessen &amp; Restaurant</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/krugs-tavern-newark" data-hovercard-id="YhiUGWjAB1y7reqoKLWCow"><span>Krug’s Tavern</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/mcwhorter-barbecue-newark" data-hovercard-id="6xf4H2rOCtUIhyMgazRsnA"><span>McWhorter Barbecue</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/spanish-tavern-newark" data-hovercard-id="muXH1f3nwoSgWB3KN-rAfA"><span>Spanish Tavern</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/casa-d-paco-newark" data-hovercard-id="iIJ-dWgYcZTewVGJyP6EfQ"><span>Casa d’Paco</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hero-king-handcrafted-sandwiches-newark" data-hovercard-id="hzwE2ub1J7fTwJDjTJwksA"><span>Hero King Handcrafted Sandwiches</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/the-green-chicpea-newark-2" data-hovercard-id="bDWWtSm-8uoW9_urjMCzTA"><span>The Green Chicpea</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/darios-restaurant-newark" data-hovercard-id="resfu-JNLUKR3l82D5W7-A"><span>Dario’s Restaurant</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/sushi-house-21-newark-2" data-hovercard-id="vMpJRWxm71XSBnWL9XfYpQ"><span>Sushi House 21</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/burger-walla-newark" data-hovercard-id="JmPZ-AyewjQPIJkKbkU0dA"><span>Burger Walla</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/hobbys-delicatessen-and-restaurant-newark" data-hovercard-id="-dEkFa3N6SXLahAMBAM8EA"><span>Hobby’s Delicatessen &amp; Restaurant</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/krugs-tavern-newark" data-hovercard-id="YhiUGWjAB1y7reqoKLWCow"><span>Krug’s Tavern</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/mcwhorter-barbecue-newark" data-hovercard-id="6xf4H2rOCtUIhyMgazRsnA"><span>McWhorter Barbecue</span></a> 
 
<a class="biz-name js-analytics-click" data-analytics-label="biz-name" href="/biz/spanish-tavern-newark" data-hovercard-id="muXH1f3nwoSgWB3KN-rAfA"><span>Spanish Tavern</span></a>

正如你所看到的,其輸出級的HTML代碼,我要的是簡單的企業的名稱。任何想法,我可以如何做不同。顯然getElementsByClass()方法不是我應該使用的。感謝先進的傢伙!

回答

0

您可以遍歷元素的子元素,或者首先使用更細粒度的選擇。我改變了你的選擇來返回包含標題的跨度,並使用text()方法返回span標籤內的文本。

Elements elements = document.select(".indexed-biz-name span"); 
for (Element element : elements) 
{ 
    System.out.println(element.text()); 
} 
+0

嘿哇,謝謝!奇蹟般有效!我其實剛剛拿起jSoup昨天。如果你不介意,你能解釋一下select()方法的語法嗎?從我所看到的情況來看,你總是會開始。然後是類名跟着標籤? –