如何使用python

我一直想一些webscraping和我碰到位於這個標籤裏面的一些有趣的數據來解析LD + JSON：如何使用python

<script type="application/ld+json">

我用美麗的湯

能夠隔離標籤

html = urlopen(url) 
soup = BeautifulSoup(html, "lxml") 

p = soup.find('script', {'type':'application/ld+json'}) 
print p

但我還沒有能夠處理數據或從該標記提取任何數據。

如果我嘗試使用正則表達式來得到一些東西出來吧，我得到：

TypeError: expected string or buffer

我怎樣才能從腳本標記得到的數據，並使用它像我使用字典或字符串？順便說一句，我使用的是Python 2.7。

來源

2017-04-27 wessells

你應該閱讀HTML解析

html = urlopen(url).read() 
soup = BeautifulSoup(html, "html.parser") 
p = soup.find('script', {'type':'application/ld+json'}) 
print p.contents

來源

2017-04-27 11:10:46

我得到一個錯誤的話說，「HTML /閱讀（）」）它這樣說：回溯（最近最後一次通話）：文件「test.py」，第20行，在 get_price（）文件「test.py」，第16行，在get_price中 soup = BeautifulSoup（html，「html.read（）」）文件「C：\ PYTHON27 \ lib \ site-packages \ bs4 \ __ init__.py」，行165，in __init__ ％，「。join（features）） bs4.FeatureNotFound：無法找到具有您請求的功能的樹生成器：html.read（）。你需要安裝一個解析器庫嗎？ – wessells

它的html.parser不是html.read（）我的錯誤 –

如果你需要你可以使用lxml代替 –

的上述評論沒有幫助（感謝雖然）

在我使用的結尾：

p = str(soup.find('script', {'type':'application/ld+json'}))

我逼成一個並不真正漂亮的字符串，但它完成了這項工作。我知道那裏可能有更好的方法，但這對我有效。

來源

2017-04-27 15:29:06 wessells

如何使用python

回答

相關問題