1
名錄的HREF我最近發佈的要求,以颳去名錄和@alecxe數據幫助了噸在我面前展現了一些新的方法來提取數據,但我堅持再次,想湊數據爲每個鏈接在黃頁,所以我可以得到有更多數據的黃頁頁面。我想添加一個名爲「url」的變量,並獲取業務的href,而不是實際的業務網站,而是業務的黃頁頁面。我嘗試了各種各樣的東西,但似乎沒有任何工作。 href在「class = business-name」之下。抓取與蟒蛇
import csv
import requests
from bs4 import BeautifulSoup
with open('cities_louisiana.csv','r') as cities:
lines = cities.read().splitlines()
cities.close()
for city in lines:
print(city)
url = "http://www.yellowpages.com/search?search_terms=businesses&geo_location_terms="baton%rouge+LA&page="+str(count)
for city in lines:
for x in range (0, 50):
print("http://www.yellowpages.com/search?search_terms=businesses&geo_location_terms=baton%rouge+LA&page="+str(x))
page = requests.get("http://www.yellowpages.com/search?search_terms=businesses&geo_location_terms=baton%rouge+LA&page="+str(x))
soup = BeautifulSoup(page.text, "html.parser")
for result in soup.select(".search-results .result"):
try:
name = result.select_one(".business-name").get_text(strip=True, separator=" ")
except:
pass
try:
streetAddress = result.select_one(".street-address").get_text(strip=True, separator=" ")
except:
pass
try:
city = result.select_one(".locality").get_text(strip=True, separator=" ")
city = city.replace(",", "")
state = "LA"
zip = result.select_one('span[itemprop$="postalCode"]').get_text(strip=True, separator=" ")
except:
pass
try:
telephone = result.select_one(".phones").get_text(strip=True, separator=" ")
except:
telephone = "No Telephone"
try:
categories = result.select_one(".categories").get_text(strip=True, separator=" ")
except:
categories = "No Categories"
completeData = name, streetAddress, city, state, zip, telephone, categories
print(completeData)
with open("yellowpages_businesses_louisiana.csv", "a", newline="") as write:
wrt = csv.writer(write)
wrt.writerow(completeData)
write.close()
很不錯的!我仍然對Python和編程一般都很陌生。儘管我只是通過添加'business_name_element = result.select_one(「。business-name」)'和link = urljoin(page.url,business_name_element [「href」])''做了一些小改動,您的解決方案仍然很棒。當我閱讀你的代碼時,我會對其進行逆向工程,這樣纔有意義。感謝您的支持! –