1
因此,可以說,我試圖讓鏈接到一個特定的形象,就像這樣:BeautifulSoup如何從.. img src獲取網址../ ..?
from bs4 import BeautfiulSoup
import urlparse
soup = BeautifulSoup("http://examplesite.com")
for image in soup.findAll("img"):
srcd = urlparse.urlparse(src)
path = srcd.path # gets the path
fn = os.path.basename(path) # gets filename
# lets say the webpage i was scraping had their images like this:
# <img src="../..someimage.jpg" />
有沒有簡單的方法來得到的完整網址?或者我將不得不使用正則表達式?
完整網址是依賴於基URI,這是依賴於上下文的(典型地,該網頁被從檢索,但要小心iframe也手冊[的' '的網址上標籤](http://www.w3.org/TR/html-markup/base.html)) –
Cameron