2017-08-30 36 views
1

Python的刮板帶來的只有1項...

大家好我是比較新的蟒蛇和幸福我做了一個腳本報廢我國的分類頁面之一。到目前爲止,劇本似乎只能抓住一件真正讓我瘋狂的東西,因爲我一直試圖修復它一個星期,而且我真的不知道任何人都可以提供幫助。我很感激,如果任何人都可以看看,並試圖解釋我在這裏所做的事情是什麼樣的。在此先感謝任何可以幫助的人!Python腳本只下腳料一個項目(分類頁)

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

my_url = 'http://www.clasificadosonline.com/UDMiscListingID.asp?MiscCat=75' 

# opening ip connection, grabbing the page 
uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 

#HTML PARSER 
page_soup = soup(page_html, "html5lib") #se cambio de "html.parser" a "html5lib por que jodia el closing form tag" 

containers = page_soup.findAll("form",{"name":"listing"}) 

#testing variables 
tags = containers[0].findAll("a", {"class":"Tahoma16Blacknounder"}) 
tagx = tags[0].text.strip() 

filename = "products.csv" 
f = open(filename, "w") 

headers = "names, prices, city, product_condition\n" 

f.write(headers) 

for container in containers: 
#holds the names of the classifieds 
names_container = container.findAll("a", {"class":"Tahoma16Blacknounder"}) 
names = names_container[0].text.strip() # comment here later 

#the span class"Tahoma14BrownNound" seems to hold the prices 
#container.findAll("span", {"class":"Tahoma14BrownNound"}) 
#the span class 
prices_container = container.findAll("span", {"class":"Tahoma14BrownNound"}) 
prices = prices_container[0].text # comment here later 

#holds the city of use of the products 
city_container = container.findAll("font", {"class":"tahoma14hbluenoUnder"}) 
city = city_container[0].text.strip() # comment here later 

#holds the states of use of the products 
product_condition_container = container.findAll("span", {"class":"style14 style15 style16"}) 
product_condition = product_condition_container[0].text # comment here later 

print("names: " + names) 
print("prices: " + prices) 
print("city: " + city) 
print("product_condition: " + product_condition) 

f.write(names.replace(",", "|") + "," + prices + "," + city + "," + product_condition + "\n") 

f.close() 

回答

0

我看看網站結構,並且您在表格後缺少對錶格的解析。

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

my_url = 'http://www.clasificadosonline.com/UDMiscListingID.asp?MiscCat=75' 

# opening ip connection, grabbing the page 
uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 

#HTML PARSER 
page_soup = soup(page_html, "html5lib") #se cambio de "html.parser" a "html5lib por que jodia el closing form tag" 

containers = page_soup.findAll("form",{"name":"listing"}) 

#testing variables 
tags = containers[0].findAll("a", {"class":"Tahoma16Blacknounder"}) 
tagx = tags[0].text.strip() 

filename = "products.csv" 
f = open(filename, "w") 

headers = "names, prices, city, product_condition\n" 

f.write(headers) 

tr = containers[0].findAll('tr', {"valign":"middle"}) 

for container in tr: 

if len(container.findAll("a", {"class":"Tahoma16Blacknounder"})) > 0: 
    #holds the names of the classifieds 
    names_container = container.findAll("a", {"class":"Tahoma16Blacknounder"}) 
    names = names_container[0].text.strip() # comment here later 

    #the span class"Tahoma14BrownNound" seems to hold the prices 
    #container.findAll("span", {"class":"Tahoma14BrownNound"}) 
    #the span class 
    prices_container = container.findAll("span", {"class":"Tahoma14BrownNound"}) 
    prices = prices_container[0].text if len(prices_container) > 0 else '' 

    #holds the city of use of the products 
    city_container = container.findAll("font", {"class":"tahoma14hbluenoUnder"}) 
    city = city_container[0].text.strip() # comment here later 

    #holds the states of use of the products 
    product_condition_container = container.findAll("span", {"class":"style14 style15 style16"}) 
    product_condition = product_condition_container[0].text # comment here later 

    print("names: " + names) 
    print("prices: " + prices) 
    print("city: " + city) 
    print("product_condition: " + product_condition) 

f.write(names.replace(",", "|") + "," + prices + "," + city + "," + product_condition + "\n") 

f.close() 
+0

你一直是最有幫助我謝謝你先生! –

+0

太棒了!請將此標記爲已回答!謝謝! – chad