之間這是我到目前爲止的代碼內容:http://pastebin.com/CdUiXpdf無法顯示在span標籤
import requests
from bs4 import BeautifulSoup
def web_crawler(max_pages):
page = 1
while page <= max_pages:
url = "https://www.kupindo.com/Knjige/artikli/1_strana_" + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
print("PAGE: " + str(page))
for link in soup.find_all("a", class_="item_link"):
href = link.get("href")
# title = link.string
print(href)
# print(title)
extended_crawler(href)
page += 1
def extended_crawler(item_url):
source_code = requests.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for view_counter in soup.find_all("span", id="BrojPregleda"):
print("View Count: ", view_counter.text)
web_crawler(1)
輸出是例如
PAGE: 1
https://www.kupindo.com/showcontent/2143/Beletristika/37875219_VUK-DRASKOVIC-Izabrana-dela-1-7-Srpska-rec
View Count:
所以瀏覽次數是空的,甚至儘管有用於查找帶有BrojPregleda標識的跨度的expanded_crawler函數,不顯示任何內容。
@Arman你是什麼意思PDF格式的代碼? pastebin鏈接隨機以pdf結尾,它是純文本 – dovla