1
我試圖使用函數bucket()請求並行使用兩個訪問令牌的GitHub API用戶信息。然後將用戶的信息保存到csv文件中。我這樣做的原因是要超過GitHub API速率限制。請忽略GitHub是否會阻止我。 (我問GitHub,但沒有得到答覆。) 我的方法是使用Python多處理庫並行運行不同參數的相同函數。實際上,我創建的這兩個進程按順序運行,而不是並行運行。 這是我的代碼:Python多處理嘗試請求並行使用兩個令牌的GitHub API
import requests
import csv
import time
from multiprocessing import Process
# *************Construct url************
url1 = 'https://api.github.com/users'
url2 = 'https://api.github.com/users?since=1000000'
token1 = 'my_token1'
token2 = 'my_token2'
headers1 = {'Authorization': 'token %s' % token1}
headers2 = {'Authorization': 'token %s' % token2}
params = {'per_page': 100}
def bucket(url, header, params, file_path):
count = 0
cnt = 0
csv_file = open(file_path, 'a', buffering=0)
writer = csv.writer(csv_file)
while count < 1: # just run 100 users' profile to see result fast
r = requests.get(url, headers=header, params=params) # get user's basic info, 100 users/request
users = r.json()
for user in users:
user_profile = requests.get(user['url'], headers=header).json() # get user's detailed profile, 1 user/request
field_names = user_profile.keys()
line = []
for field in field_names:
if (field in user_profile) and user_profile[field]:
if isinstance(user_profile[field], basestring):
line.append(user_profile[field].encode('utf-8'))
else:
line.append(user_profile[field])
else:
line.append('NULL')
writer.writerow(line)
cnt += 1
print cnt
time.sleep(0.75)
try:
url = r.links['next'].get('url') # get url for next page (100 users/page), 1 page is one request
except:
break
print(r.headers['X-RateLimit-Remaining'])
count += 1
if __name__ == '__main__':
p1 = Process(target=bucket(url3, headers3, params, 'GitHub_users3.csv'))
p1.start()
p2 = Process(target=bucket(url4, headers4, params, 'GitHub_users4.csv'))
p2.start()
p1.join()
p2.join()
有人可以幫助我實現目標嗎?如果你想分享任何可以通過GitHub API限制的想法,我很高興學習。謝謝。