2017-10-17 93 views
1

我想要的只是颳去所有的產品。爲什麼我也不能使用containers.div?當我的教程只有<div></div>時,我很困惑<div><\div><div>爲什麼我不能調用container.findAll(「h3」,{「class」:「name」})?

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

my_url = 'https://hbx.com/categories/sneakers' 

# membuka koneksi, mengambil halaman 
uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 

# html parsing 
page_soup = soup(page_html, "html.parser") 

# mengambil masing2 produk 
containers = page_soup.findAll("div",{"class":"product-wrapper col-xs-6 col-sm-4"}) 

filename = "kontol.csv" 
f = open(filename, "w") 

headers = "judul, brand, harga\n" 

f.write(headers) 

for container in containers: 
    title_container = container.findAll("h3", {"class":"name"}) 
    judul = title_container[0].text 

    brand_container = container.findAll("h4", {"class":"brand"}) 
    brand = brand_container[0].text 

    price_container = container.findAll("span", {"class":"regular-price"}) 
    harga = price_container[0].text 

    print("judul: " + judul) 
    print("brand: " + brand) 
    print("harga: " + harga) 

    f.write(judul + "," + brand + "," + harga + "\n") 

f.close() 

當我嘗試使用container.findAll( 「H3」,{ 「級」: 「名字」})調用我得到這個錯誤

Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "C:\Python36\lib\site-packages\bs4\element.py", line 1807, in __getattr__ 
    "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key 
AttributeError: ResultSet object has no attribute 'findAll'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()? 
+1

我自己的電腦上運行此代碼後,它好像你將有一些問題刮使用的urllib本網站這個數據。看起來好像很多內容是使用JavaScript渲染的,這會使你無法使用urllib來刮擦它。我會建議看看使用硒來解決這個問題:http://selenium-python.readthedocs.io/。 –

回答

1

嘗試下面的腳本,並告訴我沒沒有解決這個問題。我使用了條件語句來避免在任何項目沒有的情況下應該發生的任何錯誤,如第二個結果中price是none的情況。現在它運作良好。

import requests ; from bs4 import BeautifulSoup 

url = "https://hbx.com/categories/sneakers" 
soup = BeautifulSoup(requests.get(url).text,"lxml") 
for item in soup.find_all(class_="product-box"): 
    name = item.find(class_="name").text if item.find(class_="name") else "" 
    brand = item.find(class_="brand").text if item.find(class_="brand") else "" 
    price = item.find(class_="regular-price").text if item.find(class_="regular-price") else "" 
    print(name,brand,price) 

或與find_all如果你喜歡。但是,結果總是一樣的。

for item in soup.find_all(class_="product-box"): 
    name = item.find_all(class_="name")[0].text if item.find_all(class_="name") else "" 
    brand = item.find_all(class_="brand")[0].text if item.find_all(class_="brand") else "" 
    price = item.find_all(class_="regular-price")[0].text if item.find_all(class_="regular-price") else "" 
    print(name,brand,price) 

部分結果:

Club C 85 Reebok USD 75.00 
NMD R2 Runner Primeknit Adidas Originals 
NMD R2 Runner Adidas Originals USD 155.00 
+0

嗨,它工作得很好!非常感謝你! – Filmar

+0

確保接受它作爲答案。謝謝。 – SIM

相關問題