我試圖使用BeautifulSoup
來刮取以下頁面(例如1,2)以獲取從曼谷的一個地方到另一個地方的行動列表。BeautifulSoup獲取給定標籤後的所有鏈接
基本上,我可以查詢並選擇旅行的描述如下。
url = 'http://www.transitbangkok.com/showBestRoute.php?from=Sutthawat+-+Arun+Amarin+Intersection&to=Sukhumvit&originSelected=true&destinationSelected=true&lang=en'
route_request = requests.get(url)
soup_route = BeautifulSoup(route_request.content, 'lxml')
descriptions = soup_route.find('div', attrs={'id': 'routeDescription'})
的descriptions
的HTML看起來像下面
<div id="routeDescription">
...
<br/>
<img src="/images/walk_icon_small.PNG" style="vertical-align:middle;padding-right: 10px;margin-right: 0px;"/>Walk by foot to <b>Sanam Luang</b>
<br/>
<img src="/images/bus_icon_semi_small.gif" style="vertical-align:middle;padding-right: 10px;margin-right: 0px;"/>Travel to <b>Khok Wua</b> using the line(s): <b><a href="lines/bangkok-bus-line/2">2</a></b> or <a href="lines/bangkok-bus-line/15">15</a> or <a href="lines/bangkok-bus-line/44">44</a> or <a href="lines/bangkok-bus-line/47">47</a> or <a href="lines/bangkok-bus-line/59">59</a> or <a href="lines/bangkok-bus-line/201">201</a> or <a href="lines/bangkok-bus-line/203">203</a> or <a href="lines/bangkok-bus-line/512">512</a><br/>
...
</div>
基本上,我試圖讓行動和公交線路列表,行駛到下一個位置(問題的答案更新,但仍然沒」 t解決)。
route_descrtions = []
for description in descriptions.find_all('img'):
action = description.next_sibling
to_station = action.next_sibling
n = action.find_next_siblings('a')
if 'travel' in action.lower():
lines = [to_station.find_next('b').text] + [a.contents[0] for a in n]
else:
lines = []
desp = {'action': action,
'to': to_station.text,
'lines': lines}
route_descrtions.append(desp)
不過,我不知道如何通過鏈接循環的每個動作(Travel to
行動)之後,並追加到我的名單。我試過find_next('a')
和find_next_siblings('a')
,但沒有完成我的任務。
輸出
[{'action': 'Walk by foot to ', 'lines': [], 'to': 'Wang Lang (Siriraj)'},
{'action': 'Travel to ',
'lines': ['Chao Phraya Express Boat', '40', '48', '501', '508'],
'to': 'Si Phraya'},
{'action': 'Walk by foot to ', 'lines': [], 'to': 'Sheraton Royal Orchid'},
{'action': 'Travel to ',
'lines': ['16', '40', '48', '501', '508'],
'to': 'Siam'},
{'action': 'Travel to ',
'lines': ['BTS - Sukhumvit', '40', '48', '501', '508'],
'to': 'Asok'},
{'action': 'Walk by foot to ', 'lines': [], 'to': 'Sukhumvit'}]
所需的輸出
[{'action': 'Walk by foot to ', 'lines': [], 'to': 'Wang Lang (Siriraj)'},
{'action': 'Travel to ',
'lines': ['Chao Phraya Express Boat'],
...
謝謝安德烈!該解決方案適用於我。也感謝您的好解釋。已經接受了答案(並豎起大拇指)! – titipata