在BeautifulSoup中使用字典解析腳本標記

處理this問題的部分答案時，我遇到了一個bs4.element.Tag，它是嵌套字典和列表（s，下面）的混亂。在BeautifulSoup中使用字典解析腳本標記

有沒有辦法返回的URL中包含的s不使用re.find_all列表？關於這個標籤結構的其他評論也有幫助。

from bs4 import BeautifulSoup 
import requests 

link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p' 
r = requests.get(link) 
soup = BeautifulSoup(r.text, 'html.parser') 

s = soup.find('script', type='application/ld+json') 

## the first bit of s: 
# s 
# Out[116]: 
# <script type="application/ld+json"> 
# {"@context":"http://schema.org","@type":"ItemList","numberOfItems":50,

我已經試過：

通過與標籤完井方法隨機仔細閱讀上s。
挑選docs。

我的問題是s只有1個屬性（type）和似乎沒有任何子標籤。

來源

2017-07-07 Brad Solomon

您可以使用s.text來獲取腳本的內容。這是JSON，所以你可以用json.loads解析它。從那裏，它是簡單的字典訪問：

import json 

from bs4 import BeautifulSoup 
import requests 

link = 'https://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab&sort=p' 
r = requests.get(link) 

soup = BeautifulSoup(r.text, 'html.parser') 

s = soup.find('script', type='application/ld+json') 

urls = [el['url'] for el in json.loads(s.text)['itemListElement']] 

print(urls)

來源

2017-07-07 01:15:22 smarx

在BeautifulSoup中使用字典解析腳本標記

回答

相關問題