我有以下的HTML文件,我試圖用BeautifulSoup刮完整句,但無法得到它。目前我只得到突出顯示的單詞。我希望的輸出應該是使用Python的HTML抓取BeautifulSoup
天線助推器已停止發送信號文件,可能的用戶網絡問題或BOOSTER問題。
任何解決方案?
</table>
<!--Record Header End-->
<span style="BACKGROUND-COLOR: #0000ff; color: #ffffff">
Antenna
</span>
<span style="BACKGROUND-COLOR: #0000ff; color: #ffffff">
booster
</span>
has stopped
<span style="BACKGROUND-COLOR: #0000ff; color: #ffffff">
sending
</span>
signal files ,possible user
<span style="BACKGROUND-COLOR: #0000ff; color: #ffffff">
network
</span>
<span style="BACKGROUND-COLOR: #ff0000">
issue
</span>
or BOOSTER
<span style="BACKGROUND-COLOR: #ff0000">
issue
</span>
.
<br>
<br>
<br>
這裏是我的嘗試:
issue_field = soup.find_all('span', {'style':'BACKGROUND-COLOR: #0000ff; color: #ffffff'})
issue_str = str(issue_field)
Issue_corpora = [word.lower() for word in BeautifulSoup(issue_str,'html.parser').get_text().strip().split(',')]
print(Issue_corpora)
顯示您嘗試過的'bs'代碼。 –
issue_field = soup.find_all('span', {'style':'BACKGROUND-COLOR:#0000ff; color:#ffffff'}) issue_str = str(issue_field) Issue_corpora = [word.lower()for word在 BeautifulSoup(issue_str,'html.parser')。get_text()。strip()。split(',')] print(Issue_corpora) –
也許你正則表達式('re')就足以滿足你的這個需求了例如:'re.sub('.*>','',t).replace('\ n','')' –