Python 3中，美麗的湯，得到一個標籤

我有repeates本身與其他的href鏈接幾次下面的HTML部分：Python 3中，美麗的湯，得到一個標籤

<div class="product-list-item margin-bottom"> 
<a title="titleexample" href="http://www.urlexample.com/example_1" data-style-id="sp_2866">

現在我想要得到這個文件是直接在所有的HREF鏈接在類別爲「product-list-item」的div標籤之後。對於美麗的女孩來說，這是一件很新的事情，而且我沒想過工作。

感謝您的想法。

編輯：並不一定是beautifulsoup;當它可以用正則表達式和python html解析器完成時，這也可以。

EDIT2：我試了一下（我是很新的蟒蛇，所以我做了什麼可能是從一個先進的觀點完全以愚蠢）：

soup = bs4.BeautifulSoup(htmlsource) 
x = soup.find_all("div") 
for i in range(len(x)): 
    if x[i].get("class") and "product-list-item" in x[i].get("class"): 
     print(x[i].get("class"))

這會給我所有「的產品名單-list項目」但後來我想是這樣

print(x[i].get("class").next_element)

因爲我想next_element或NEXT_SIBLING應該給我一個標籤，但它只是導致AttributeError的：‘名單’對象有沒有屬性‘next_element’。所以，我想，只有第一個列表元素：

print(x[i][0].get("class").next_element)

而導致這個錯誤：返回self.attrs [關鍵] KeyError異常：0 還與.find_all試過（「HREF」），並獲得（」 href「），但這一切都導致相同的錯誤。

EDIT3：好吧，似乎我發現瞭如何解決這個問題，現在我所做的：

x = soup.find_all("div", "product-list-item") 
for i in x: 
    print(i.next_element.next_element.get("href"))

問候

：

x = soup.find_all("div") 

for i in range(len(x)):  
    if x[i].get("class") and "product-list-item" in x[i].get("class"): 
     print(x[i].next_element.next_element.get("href"))

這也可以通過其他屬性的find_all功能可縮短

來源

2013-05-31 user136036

你能告訴我們你試過什麼嗎？謝謝 – Drewdin

I want to get all the href links in this document that are directly after the div tag with the class "product-list-item"

要找到中的第一個<a href>元素10：

links = [] 
for div in soup.find_all('div', 'product-list-item'): 
    a = div.find('a', href=True) # find <a> anywhere in <div> 
    if a is not None: 
     links.append(a['href'])

它假設鏈接在<div>之內。忽略第一個<a href>之前的<div>中的任何元素。

如果您願意，你可以更嚴格的關於它例如，只服用，如果它是第一個孩子鏈接<div>：

a = div.contents[0] # take the very first child even if it is not a Tag 
if a.name == 'a' and a.has_attr('href'): 
    links.append(a['href'])

或者，如果<a>是不是裏面<div>：

a = div.find_next('a', href=True) # find <a> that appears after <div> 
if a is not None: 
    links.append(a['href'])

There are many ways to search and navigate in BeautifulSoup。

如果使用lxml.html進行搜索，則可以使用xpath和css表達式，前提是您熟悉它們。

來源

2013-05-31 18:58:17 jfs

Python 3中，美麗的湯，得到一個標籤

回答

相關問題