1
Python的刮板帶來的只有1項...
大家好我是比較新的蟒蛇和幸福我做了一個腳本報廢我國的分類頁面之一。到目前爲止,劇本似乎只能抓住一件真正讓我瘋狂的東西,因爲我一直試圖修復它一個星期,而且我真的不知道任何人都可以提供幫助。我很感激,如果任何人都可以看看,並試圖解釋我在這裏所做的事情是什麼樣的。在此先感謝任何可以幫助的人!Python腳本只下腳料一個項目(分類頁)
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://www.clasificadosonline.com/UDMiscListingID.asp?MiscCat=75'
# opening ip connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#HTML PARSER
page_soup = soup(page_html, "html5lib") #se cambio de "html.parser" a "html5lib por que jodia el closing form tag"
containers = page_soup.findAll("form",{"name":"listing"})
#testing variables
tags = containers[0].findAll("a", {"class":"Tahoma16Blacknounder"})
tagx = tags[0].text.strip()
filename = "products.csv"
f = open(filename, "w")
headers = "names, prices, city, product_condition\n"
f.write(headers)
for container in containers:
#holds the names of the classifieds
names_container = container.findAll("a", {"class":"Tahoma16Blacknounder"})
names = names_container[0].text.strip() # comment here later
#the span class"Tahoma14BrownNound" seems to hold the prices
#container.findAll("span", {"class":"Tahoma14BrownNound"})
#the span class
prices_container = container.findAll("span", {"class":"Tahoma14BrownNound"})
prices = prices_container[0].text # comment here later
#holds the city of use of the products
city_container = container.findAll("font", {"class":"tahoma14hbluenoUnder"})
city = city_container[0].text.strip() # comment here later
#holds the states of use of the products
product_condition_container = container.findAll("span", {"class":"style14 style15 style16"})
product_condition = product_condition_container[0].text # comment here later
print("names: " + names)
print("prices: " + prices)
print("city: " + city)
print("product_condition: " + product_condition)
f.write(names.replace(",", "|") + "," + prices + "," + city + "," + product_condition + "\n")
f.close()
你一直是最有幫助我謝謝你先生! –
太棒了!請將此標記爲已回答!謝謝! – chad