使用beautifulsoup在一堆獨特的div類中提取hrefs

我是一個初學者，我試圖抓取a hrefs，它們分別嵌入在一堆div class中。當我檢查的元素，它看起來是這樣的：使用beautifulsoup在一堆獨特的div類中提取hrefs

<div class="item hentry" itemscope="" itemtype="http://schema.org/BlogPosting" data-id="1252224732659290211"> 
<img class="thumbnail" src="//img.youtube.com/vi/fX_kx_drRsY/0.jpg" style="width: 30px; height: 30px;"> 
    <h3 class="title entry-title" itemprop="name"> 
    <a href="the link i want to extract"</a> 
    </h3> 
</div>

我一直在尋找＃1，但大多數的例子是其中div class是固定的，我的網頁有不固定的div類，數據ID不同。

我試過使用以下，但我認爲它只適用於div類是固定的？

with open("list_of_urls.txt", "wb") as f: 
    for item in soup.find_all("div", attrs={"class" : "item hentry"}): 
     for link in item.find_all('a'): 
      f.write("%s\n" % link["href"])

來源

2017-01-28 song0089

soup.select('div[class] a') # find all a tags under the div tag which has class attribute

使用CSS selector

來源

2017-01-28 06:37:26

使用beautifulsoup在一堆獨特的div類中提取hrefs

回答

相關問題