無法顯示在span標籤

之間這是我到目前爲止的代碼內容：http://pastebin.com/CdUiXpdf 無法顯示在span標籤

import requests 
from bs4 import BeautifulSoup 


def web_crawler(max_pages): 
    page = 1 
    while page <= max_pages: 
     url = "https://www.kupindo.com/Knjige/artikli/1_strana_" + str(page) 
     source_code = requests.get(url) 
     plain_text = source_code.text 
     soup = BeautifulSoup(plain_text, "html.parser") 
     print("PAGE: " + str(page)) 
     for link in soup.find_all("a", class_="item_link"): 
      href = link.get("href") 
      # title = link.string 
      print(href) 
      # print(title) 
      extended_crawler(href) 
     page += 1 


def extended_crawler(item_url): 
    source_code = requests.get(item_url) 
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text, "html.parser") 
    for view_counter in soup.find_all("span", id="BrojPregleda"): 
     print("View Count: ", view_counter.text) 


web_crawler(1)

輸出是例如

PAGE: 1 
https://www.kupindo.com/showcontent/2143/Beletristika/37875219_VUK-DRASKOVIC-Izabrana-dela-1-7-Srpska-rec 
View Count:

所以瀏覽次數是空的，甚至儘管有用於查找帶有BrojPregleda標識的跨度的expanded_crawler函數，不顯示任何內容。

來源

2017-02-25 dovla

@Arman你是什麼意思PDF格式的代碼？ pastebin鏈接隨機以pdf結尾，它是純文本 – dovla

那是因爲其具有的ID BrojPregleda跨度正在通過Ajax調用填充。無論是用Selenium來獲取值或者請按照下列步驟操作：

1）獲取從產品ID在URL

2）後到http://www.kupindo.com/inc/ajx/Predmet/ajxGetBrojPregleda.php有一個FORMDATA關鍵 - 與1的值IDPredmet）

3）獲得的觀看次數

例子：

def extended_crawler(item_url): 
    source_code = requests.get(item_url) 
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text, "html.parser") 
    ViewCount = requests.post('http://www.kupindo.com/inc/ajx/Predmet/ajxGetBrojPregleda.php', data = {'IDPredmet': item_url[item_url.rfind('/') + 1:item_url.rfind('_')]}) 
    print (ViewCount.text)

來源

2017-02-25 21:22:45 Zroq

這很有效，非常感謝。從來沒有想到這一點 – dovla

無法顯示在span標籤

回答

相關問題