像這樣的SVG圖表往往有點難以刮取。只有用鼠標實際懸停各個元素後,纔會顯示您想要的數字。
要得到你需要
- 數據查找所有點
- 對於dots_list每個點的列表中,單擊或懸停(動作鏈)網點
- 刮在工具提示中值彈出
這個工作對我來說:
from __future__ import print_function
from pprint import pprint as pp
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
def main():
driver = webdriver.Chrome()
ac = ActionChains(driver)
try:
driver.get("https://opensignal.com/reports/2016/02/state-of-lte-q4-2015/")
dots_css = "div#network_download g g.dots_container circle"
dots_list = driver.find_elements_by_css_selector(dots_css)
print("Found {0} data points".format(len(dots_list)))
download_speeds = list()
for index, _ in enumerate(dots_list, 1):
# Because this is an SVG chart, and because we need to hover it,
# it is very likely that the elements will go stale as we do this. For
# that reason we need to require each dot element right before we click it
single_dot_css = dots_css + ":nth-child({0})".format(index)
dot = driver.find_element_by_css_selector(single_dot_css)
dot.click()
# Scrape the text from the popup
popup_css = "div#network_download div.tooltip"
popup_text = driver.find_element_by_css_selector(popup_css).text
pp(popup_text)
rank, comp_and_country, speed = popup_text.split("\n")
company, country = comp_and_country.split(" in ")
speed_dict = {
"rank": rank.split(" Globally")[0].strip("#"),
"company": company,
"country": country,
"speed": speed.split("Download speed: ")[1]
}
download_speeds.append(speed_dict)
# Hover away from the tool tip so it clears
hover_elem = driver.find_element_by_id("network_download")
ac.move_to_element(hover_elem).perform()
pp(download_speeds)
finally:
driver.quit()
if __name__ == "__main__":
main()
樣本輸出:
(.venv35) ➜ stackoverflow python svg_charts.py
Found 182 data points
'#1 Globally\nSingTel in Singapore\nDownload speed: 40 Mbps'
'#2 Globally\nStarHub in Singapore\nDownload speed: 39 Mbps'
'#3 Globally\nSaskTel in Canada\nDownload speed: 35 Mbps'
'#4 Globally\nOrange in Israel\nDownload speed: 35 Mbps'
'#5 Globally\nolleh in South Korea\nDownload speed: 34 Mbps'
'#6 Globally\nVodafone in Romania\nDownload speed: 33 Mbps'
'#7 Globally\nVodafone in New Zealand\nDownload speed: 32 Mbps'
'#8 Globally\nTDC in Denmark\nDownload speed: 31 Mbps'
'#9 Globally\nT-Mobile in Hungary\nDownload speed: 30 Mbps'
'#10 Globally\nT-Mobile in Netherlands\nDownload speed: 30 Mbps'
'#11 Globally\nM1 in Singapore\nDownload speed: 29 Mbps'
'#12 Globally\nTelstra in Australia\nDownload speed: 29 Mbps'
'#13 Globally\nTelenor in Hungary\nDownload speed: 29 Mbps'
<...>
[{'company': 'SingTel',
'country': 'Singapore',
'rank': '1',
'speed': '40 Mbps'},
{'company': 'StarHub',
'country': 'Singapore',
'rank': '2',
'speed': '39 Mbps'},
{'company': 'SaskTel', 'country': 'Canada', 'rank': '3', 'speed': '35 Mbps'}
...
]
應當注意的是,你在問題中所引用的值,在圈內的元素,並不是特別有用,因爲這些只是說明如何在SVG圖表中畫出點。