從使用Python的html文件中提取字符串（beautifulsoup？）

有一個html文件保存在我的硬盤上，我需要提取顯示在html頁面上的字符串並使用python將它們保存到文本文件中。從使用Python的html文件中提取字符串（beautifulsoup？）

html representation with tags, etc: 
Bme:&nbsp;1&nbsp;Port:&nbsp;1<br /> 
Downstream&nbsp;line&nbsp;rate:&nbsp;6736&nbsp;kbps<br /> 
Upstream&nbsp;line&nbsp;rate:&nbsp;964&nbsp;kbps<br />

我需要從上面提取是

Downstream&nbsp;line&nbsp;rate:&nbsp;

在這種情況下，6736

後的數字，寫這個數字到一個文件中。這怎麼能實現？

來源

2013-03-24 user2203807

BeautifulSoup可能是矯枉過正。如果所有的「下游」行都是這樣格式化的，你可以很容易地用正則表達式獲得這些數字。

>>> import re 
>>> regex = r'Downstream&nbsp;line&nbsp;rate:&nbsp;(\d\d*)&nbsp;kbps<br />' 
>>> re.search(regex, "Downstream&nbsp;line&nbsp;rate:&nbsp;6736&nbsp;kbps<br />").group(1) 
'6736'

如果所有的行都沒有完全像這樣格式化，那麼您可能必須使正則表達式更一般化。可能類似於r'Downstream.*(\d\d*)'。

來源

2013-03-24 04:58:18

謝謝，這解決了我的問題。 – user2203807 2013-03-26 17:37:41

從使用Python的html文件中提取字符串（beautifulsoup？）

回答

相關問題