Scrap authors h-index，i10-index和Google學術搜索的總引用

我正在研究一個項目，以從Google學術搜索中獲取數據。我想刮一個作家h指數，總引用和i-10指數（全部）。例如，從Louisa Gilbert我想刮：Scrap authors h-index，i10-index和Google學術搜索的總引用

h-index = 36 
i10-index = 74 
citations = 4383

我寫了這個：

from bs4 import BeautifulSoup 
import urllib.request 
url="https://scholar.google.ca/citations?user=OdQKi7wAAAAJ&hl=en" 
page = urllib.request.urlopen(url) 
soup = BeautifulSoup(page, 'html.parser')

，但我不知道該如何繼續。（我知道有一些庫可用，但沒有人允許你刮h指數和i10指數。）

來源

2016-12-25 user7340115

你幾乎在那裏。您需要找到包含要提取的數據的HTML元素。在這種特殊情況下，索引包含在標籤<td class="gsc_rsb_std">中。您需要從Soup元素中提取這些標籤，然後使用方法string從標籤內恢復文本：

indexes = soup.find_all("td", "gsc_rsb_std") 
h_index = indexes[2].string 
i10_index = indexes[4].string 
citations = indexes[0].string

來源

2016-12-27 15:10:31

Scrap authors h-index，i10-index和Google學術搜索的總引用

回答

相關問題