你可以使用get_text並傳遞一個字符的文本分離或拉使用的h1
. h1.find(text=True, recursive=False)
文字和拉從跨度直接的文字:
In [1]: h ="""<h1 class="product-name elim-suites">Chantecaille<span itemprop="name" >Limited Edition Protect the Lion Eye Palette
...: </span></h1>"""
In [2]: from bs4 import BeautifulSoup
In [3]: soup = BeautifulSoup(h, "html.parser")
In [4]: h1 = soup.select_one("h1.product-name.elim-suites")
In [5]: print(h1.get_text("\n"))
Chantecaille
Limited Edition Protect the Lion Eye Palette
In [6]: prod, desc = h1.find(text=True, recursive=False), h1.span.text
In [7]: print(prod, desc)
(u'Chantecaille', u'Limited Edition Protect the Lion Eye Palette\n')
或者如果文本可能出現在跨度也使用find_all:
In [8]: h ="""<h1 class="product-name elim-suites">Chantecaille
<span itemprop="name" >Limited Edition Protect the Lion Eye Palette</span>other text</h1>"""
In [9]: from bs4 import BeautifulSoup
In [10]: soup = BeautifulSoup(h, "html.parser")
In [11]: h1 = soup.select_one("h1.product-name.elim-suites")
In [12]: print(h1.get_text("\n"))
Chantecaille
Limited Edition Protect the Lion Eye Palette
other text
In [13]: prod, desc = " ".join(h1.find_all(text=True, recursive=False)), h1.span.text
In [14]:
In [14]: print(prod, desc)
(u'Chantecaille other text', u'Limited Edition Protect the Lion Eye Palette')
嘗試使用'.contents'或'.strings'代替'.text'然後加入字符串作爲證明[這裏](http://stackoverflow.com /問題/ 16121001 /建議-上獲得文本功能於beautifulsoup) – bunji