您可以使用底層web-api來提取由angularJS javascript框架呈現的網格項細節,因此HTML不是靜態的。
解析的一種方法是使用硒來獲取數據,但使用瀏覽器的開發工具來識別web-api是非常簡單的。
編輯:我用Firebug插件與Firefox看到從 「網絡標籤」
和頁面的GET請求作出的GET請求是:
https://catalog.paytm.com/v1//g/electronics/mobile-accessories/mobiles/smart-phones?page_count=1&items_per_page=30&resolution=960x720&quality=high&sort_popular=1&cat_tree=1&callback=angular.callbacks._3&channel=web&version=2
而且它返回了一個回調JS腳本,它幾乎完全是JSON數據。
它傳回的JSON包含的細節
每個網格項目被形容爲像下面一個JSON對象將網格項目:
{
"product_id": 23491960,
"complex_product_id": 7287171,
"name": "Samsung Galaxy Z1 (Black)",
"short_desc": "",
"bullet_points": {
"salient_feature": ["Screen: 10.16 cm (4\")", "Camera: 3.1 MP Rear/VGA Front", "RAM: 768 MB", "ROM: 4 GB", "Dual-core 1.2 GHz Cortex-A7", "Battery: 1500 mAh/Li-Ion"]
},
"url": "https://catalog.paytm.com/v1/p/samsung-z1-black-MOBSAMSUNG-Z1-BSMAR2320696B3C745",
"seourl": "https://catalog.paytm.com/v1/p/samsung-z1-black-MOBSAMSUNG-Z1-BSMAR2320696B3C745",
"url_type": "product",
"promo_text": null,
"image_url": "https://assetscdn.paytm.com/images/catalog/product/M/MO/MOBSAMSUNG-Z1-BSMAR2320696B3C745/2.jpg",
"vertical_id": 18,
"vertical_label": "Mobile",
"offer_price": 5090,
"actual_price": 5799,
"merchant_name": "SMARTBUY",
"authorised_merchant": false,
"stock": true,
"brand": "Samsung",
"tag": "+5% Cashback",
"product_tag": "+5% Cashback",
"shippable": true,
"created_at": "2015-09-17T08:28:25.000Z",
"updated_at": "2015-12-29T05:55:29.000Z",
"img_width": 400,
"img_height": 400,
"discount": "12"
}
所以,你可以得到的細節,甚至沒有在使用beautifulSoup以下方式。
import requests
import json
response = requests.get("https://catalog.paytm.com/v1//g/electronics/mobile-accessories/mobiles/smart-phones?page_count=1&items_per_page=30&resolution=960x720&quality=high&sort_popular=1&cat_tree=1&callback=angular.callbacks._3&channel=web&version=2")
jsonResponse = ((response.text.split('angular.callbacks._3('))[1].split(');')[0])
data = json.loads(jsonResponse)
print(data["grid_layout"])
grid_data = data["grid_layout"]
for grid_item in grid_data:
print("Brand:", grid_item["brand"])
print("Product Name:", grid_item["name"])
print("Current Price: Rs", grid_item["offer_price"])
print("==================")
你會得到輸出如下
Brand: Samsung
Product Name: Samsung Galaxy Z1 (Black)
Current Price: Rs 4990
==================
Brand: Samsung
Product Name: Samsung Galaxy A7 (Gold)
Current Price: Rs 22947
==================
希望這有助於。
請分享您迄今爲止寫的代碼。 – JRodDynamite
r = requests.get(url) soup = BeautifulSoup(r.content,「html.parser」) plink = soup.find_all(「div」,{「class」:「f1」})[0] .find_all (「grid-item」)[0] – Nain
檢查傳遞給'BeautifulSoup'的HTML(即'r.content')。它可能與開發人員工具欄顯示的HTML不同。如果它缺少''標記,JavaScript可能被用於將內容插入到網頁中。如果是這種情況,您需要[支持JavaScript的瀏覽器(如Selenium)](http://stackoverflow.com/q/17436014/190597)來獲取內容。 –
unutbu