2016-12-18 56 views
0

我試圖通過搜索p0662110597086(他的id)使用BeautifulSoup從此頁面中提取URL。我已經用BeautifulSoup嘗試了幾種不同的方法,包括一個不同的html解析器,但都沒有成功。使用beautifulsoup查找productID

  <a href="#media" class="movie" hpp="act_video">video</a>   <ul> 
      <li>identity:<span itemprop="productID">p0662110597086</span></li> 
     <li>soll numbers:75</li> 
     <li>solds:97</li> 
     </ul> 

回答

0
import bs4 
html = '''   <a href="#media" class="movie" hpp="act_video">video</a>   <ul> 
      <li>identity:<span itemprop="productID">p0662110597086</span></li> 
     <li>soll numbers:75</li> 
     <li>solds:97</li> 
     </ul>''' 
soup = bs4.BeautifulSoup(html, 'lxml') 

id_tag = soup.find('span', string='p0662110597086') 
a_tag = id_tag.find_previous('a', class_='movie') 

出來:

id_tag: <span itemprop="productID">p0662110597086</span> 
a_tag: <a class="movie" hpp="act_video" href="#media">video</a> 

簽名:find_all_previous(name, attrs, string, limit, **kwargs)

簽名:find_previous(name, attrs, string, **kwargs)

這些方法使用.previous_elements遍歷標籤和在文檔中出現210個字符串。​​ 方法返回所有匹配,並且find_previous()僅返回第一個 匹配