這是我第一個通過關注YouTube視頻編寫的Python項目。雖然不是很精通,但我認爲我有編碼的基礎知識。使用Python抓取網站後獲取特定數據
#importing the module that allows to connect to the internet
import requests
#this allows to get data from by crawling webpages
from bs4 import BeautifulSoup
#creating a loop to change url everytime it is executed
def creator_spider(max_pages):
page = 0
while page < max_pages:
url = 'https://www.patreon.com/sitemap/campaigns/' + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for link in soup.findAll('a', {'class': ''}):
href = "https://www.patreon.com" + link.get('href')
#title = link.string
print(href)
#print(title)
get_single_item_data(href)
page = page + 1
def get_single_item_data(item_url):
source_code = requests.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
print soup
for item_name in soup.findAll('h6'):
print(item_name.string)
從每一頁我爬,我想要的代碼來獲得這個突出的信息:http://imgur.com/a/e59S9 其源代碼是:http://imgur.com/a/8qv7k
我估計是我應該改變soup.findAll的屬性()在get_single_item_data()函數中,但是我所有的嘗試都是徒勞的。對此非常感謝。
這是一個javascript網站,無法檢索。你需要模擬一個真正的瀏覽器來抓取這些頁面。你可以嘗試硒或phantomjs – sailesh