美麗的湯獨自不能得到一個子字符串。你可以使用正則表達式。
from bs4 import BeautifulSoup
import re
html = """<p>
A
<span>die</span>
is thrown \(x = {-b \pm
<span>\sqrt</span>
{b^2-4ac} \over 2a}\) twice. What is the probability of getting a sum 7 from
both the throws?
</p>"""
soup = BeautifulSoup(html, 'html.parser')
print re.findall(r'\\\(.*?\)', soup.text, re.DOTALL)
輸出:
[u'\\(x = {-b \\pm \n \\sqrt\n {b^2-4ac} \\over 2a}\\)']
正則表達式:
\\\(.*?\) - Get substring from (to).
如果你想要去除的換行和空格,你可以像這樣:
res = re.findall(r'\\\(.*?\)', soup.text, re.DOTALL)[0]
print ' '.join(res.split())
輸出:
個
\(x = {-b \pm \sqrt {b^2-4ac} \over 2a}\)
串繞HTML包裝:
print BeautifulSoup(' '.join(res.split()))
輸出:
<html><body><p>\(x = {-b \pm \sqrt {b^2-4ac} \over 2a}\)</p></body></html>
嗨我預計輸出爲[u'\\(x = {-b \\ pm \ n \\ sqrt \ n {b^2-4ac} \\ 2a} \\)']。你能建議改變正則表達式嗎? – waranlogesh
@waranlogesh當然。在'('。')之前也加上反斜槓修改解決方案 – MYGz
有沒有辦法將打印的更改保存到html? – waranlogesh