我正在將網頁加載到iframe中，我想確保使所有關聯的媒體都可用。我目前正在使用請求下載頁面，然後進行一些查找/替換，但這並沒有完全覆蓋。有沒有辦法用python來獲取頁面在加載到瀏覽器時所做的所有腳本，css和圖像請求的列表？使用請求或在Python中機械化加載所有第三方腳本

2013-10-22 James

BeautifulSoup

使用BeautifulSoup4讓所有的<img>，<link>和<script>標籤，然後拉出相應的屬性。

from bs4 import BeautifulSoup 
import requests 

resp = requests.get("http://www.yahoo.com") 

soup = BeautifulSoup(resp.text) 

# Pull the linked images (note: will grab base64 encoded images) 
images = [img['src'] for img in soup.findAll('img') if img.has_key('src')] 

# Checking for src ensures that we don't grab the embedded scripts 
scripts = [script['src'] for script in soup.findAll('script') if script.has_key('src')] 

# favicon.ico and css 
links = [link['href'] for link in soup.findAll('link') if link.has_key('href')]

輸出示例：

In [30]: images = [img['src'] for img in soup.findAll('img') if img.has_key('src')] 

In [31]: images[:5] 
Out[31]: 
['http://l.yimg.com/dh/ap/default/130925/My_Yahoo_Defatul_HP_ad_300x250.jpeg', 
'http://l.yimg.com/os/mit/media/m/base/images/transparent-95031.png', 
'http://l.yimg.com/os/mit/media/m/base/images/transparent-95031.png', 
'http://l.yimg.com/os/mit/media/m/base/images/transparent-95031.png', 
'http://l.yimg.com/os/mit/media/m/base/images/transparent-95031.png']

來源

2013-10-22 18:48:28

使用請求或在Python中機械化加載所有第三方腳本

回答

BeautifulSoup

相關問題