2016-01-03 96 views
0

我試圖收集來自以下明文/企業稱號: <div class = "business-detail-text> <h1 class = "business-title" style="position:relative;" itemprop="name">H&H Construction Co.</h1>的Python HTML解析通過CSS選擇

什麼是做到這一點的最好方法是什麼? itemprop屬性的風格&是我卡住的地方。我知道我可以使用soup.select,但目前我沒有運氣。

這裏是我到目前爲止的代碼:

def bbb_profiles(profile_urls): 
    sauce_code = requests.get(profile_urls) 
    plain_text = sauce_code.text 
    soup = BeautifulSoup(plain_text, "html.parser") 
    for profile_info in soup.findAll("h1", {"class": "business-title"}): 
     print(profile_info.string) 

回答

1

是你需要什麼?

>>> from bs4 import BeautifulSoup 
>>> txt='''<div class = "business-detail-text"> 
      <h1 class = "business-title" style="position:relative;" itemprop="name">H&H Construction Co.</h1></div>''' 
>>> soup = BeautifulSoup(txt, "html.parser") 
>>> soup.find_all('h1', 'business-title') 
[<h1 class="business-title" itemprop="name" style="position:relative;">H&amp;H; Construction Co.</h1>] 
>>> soup.find_all('h1', 'business-title')[0].text 
u'H&H; Construction Co.' 

我看到企業詳細文本和「之後」你的HTML缺少</DIV>在快結束的時候

+0

我會嘗試。謝謝! – n0de