我想從網站上抓取數據,我需要文本標題。使用python和beautlfulsoup從網站中的href中提取文本
[<a href="http://www.thegolfcourses.net/golfcourses/TX/38468.htm" rel="bookmark">Feather Bay Golf Course and Resort</a>]
[<a href="http://www.thegolfcourses.net/golfcourses/AZ/174830.htm" rel="bookmark">Paradise Valley Country Club</a>]
[<a href="http://www.thegolfcourses.net/golfcourses/IL/129935.htm" rel="bookmark">The Golf Club at Waters Edge</a>]
[<a href="http://www.thegolfcourses.net/golfcourses/NY/10630.htm" rel="bookmark">1000 Acres Ranch Resort</a>]
[<a href="http://www.thegolfcourses.net/golfcourses/VA/995731.htm" rel="bookmark">1757 Golf Club, 1757 Golf Club Front 9 Golf Course</a>]
[<a href="http://www.thegolfcourses.net/golfcourses/WI/320815.htm" rel="bookmark">27 Pines Golf Course</a>]
[<a href="http://www.thegolfcourses.net/golfcourses/WY/823145.htm" rel="bookmark">3 Creek Ranch Golf Club</a>]
[<a href="http://www.thegolfcourses.net/golfcourses/CA/18431.htm" rel="bookmark">3 Par At Four Points</a>]
[<a href="http://www.thegolfcourses.net/golfcourses/AZ/470720.htm" rel="bookmark">3 Parks Fairways</a>]
[<a href="http://www.thegolfcourses.net/golfcourses/IA/074920.htm" rel="bookmark">3-30 Golf & Country Club</a>]
我使用此代碼來處理它,但我有一個很難writign代碼提取如何去做這個什麼好的建議?
import csv
import requests
from bs4 import BeautifulSoup
courses_list = []
for i in range(1):
url="http://www.thegolfcourses.net/page/{}?ls&location=California&orderby=title&radius=6750#038;location=California&orderby=title&radius=6750".format(i)
r = requests.get(url)
soup = BeautifulSoup(r.content)
g_data2=soup.find_all("article")
for item in g_data2:
try:
name= item.contents[5].find_all("a")
print name
except:
name=''
我想,但我仍然得到空白當我EXCUTE代碼。 – Gonzalo68
@ Gonzalo68這是一種低效的方式,但它可能會起作用。 >>> x =「 text>」 >>> y = x.split(「>」)[1] >>> z = y.split(「<」)[0] – userFriendly