2015-04-24 38 views
2

我試圖獲得關注公司的跟隨者數量並隨着時間的推移對其進行跟蹤。我有超過20萬家公司,所以我現在的代碼將花費數年的時間來運行當前的API限制。如何使用tweepy獲取關注者數量

c = tweepy.Cursor(api.followers_ids, id = a) 
ids = [] 
for id in c.items(): 
    time.sleep(0.01) 
    ids.append(id) ' 

在這段代碼中,它的一個API針對每個跟隨者。我想知道是否有一個函數將一個跟隨者數作爲一個數字?還有什麼是Twitter的API限制?

+0

如果你只需要跟隨者數量,而不是id,你可以從每條推文的元數據中獲取這些信息。因此,假設您想跟蹤的公司有足夠的活動,使用流式API將是最快的方式,除了沒有速率限制。然而,200.000家公司太多了,你無法使用術語或screen_names來追蹤它們......最大值爲5000個id(請參閱[這裏](https://dev.twitter.com/streaming/reference/post/statuses/filter) )。即使有這個限制,這是你最好的機會...... 200.000與REST API速率限制將需要幾個月來檢索!希望它有幫助 – lrnzcig

回答

1

每個API請求一次返回至多5000個關注者ID,以檢索200 000家公司的所有關注者,這裏是該書中非常有用的腳本挖掘Matthew A. Russell的社交網絡以解決Twitter的API限制

,使強大的Twitter的請求,訪問Twitter的API 馬修定義這些方法:

import sys 
import time 
from urllib2 import URLError 
from httplib import BadStatusLine 
import json 
import twitter 

def oauth_login(): 
    CONSUMER_KEY = '' 
    CONSUMER_SECRET = '' 
    OAUTH_TOKEN = '' 
    OAUTH_TOKEN_SECRET = '' 
    auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET, 
           CONSUMER_KEY, CONSUMER_SECRET) 
    twitter_api = twitter.Twitter(auth=auth) 
    return twitter_api 

def make_twitter_request(twitter_api_func, max_errors=10, *args, **kw): 
# A nested helper function that handles common HTTPErrors. Return an updated 
# value for wait_period if the problem is a 500 level error. Block until the 
# rate limit is reset if it's a rate limiting issue (429 error). Returns None 
# for 401 and 404 errors, which requires special handling by the caller. 
    def handle_twitter_http_error(e, wait_period=2, sleep_when_rate_limited=True): 
     if wait_period > 3600: # Seconds 
      print >> sys.stderr, 'Too many retries. Quitting.' 
      raise e 
# See https://dev.twitter.com/docs/error-codes-responses for common codes 
     if e.e.code == 401: 

      print >> sys.stderr, 'Encountered 401 Error (Not Authorized)' 
      return None 
     elif e.e.code == 404: 
      print >> sys.stderr, 'Encountered 404 Error (Not Found)' 
      return None 
     elif e.e.code == 429: 
      print >> sys.stderr, 'Encountered 429 Error (Rate Limit Exceeded)' 
      if sleep_when_rate_limited: 
       print >> sys.stderr, "Retrying in 15 minutes...ZzZ..." 
       sys.stderr.flush() 
       time.sleep(60*15 + 5) 
       print >> sys.stderr, '...ZzZ...Awake now and trying again.' 
       return 2 
      else: 
       raise e # Caller must handle the rate limiting issue 
     elif e.e.code in (500, 502, 503, 504): 
      print >> sys.stderr, 'Encountered %iError. Retrying in %iseconds' %\ 
      (e.e.code, wait_period) 
      time.sleep(wait_period) 
      wait_period *= 1.5 
      return wait_period 
     else: 
      raise e 
# End of nested helper function 
    wait_period = 2 
    error_count = 0 
    while True: 
     try: 
      return twitter_api_func(*args, **kw) 
     except twitter.api.TwitterHTTPError, e: 
      error_count = 0 
      wait_period = handle_twitter_http_error(e, wait_period) 
      if wait_period is None: 
       return 
     except URLError, e: 
      error_count += 1 
      print >> sys.stderr, "URLError encountered. Continuing." 
      if error_count > max_errors: 
       print >> sys.stderr, "Too many consecutive errors...bailing out." 
       raise 
     except BadStatusLine, e: 
      error_count += 1 
      print >> sys.stderr, "BadStatusLine encountered. Continuing." 
      if error_count > max_errors: 
       print >> sys.stderr, "Too many consecutive errors...bailing out." 
       raise 

這裏是獲取朋友的方法和追隨者:

from functools import partial 
from sys import maxint 

def get_friends_followers_ids(twitter_api, screen_name=None, user_id=None, 
friends_limit=maxint, followers_limit=maxint): 
    # Must have either screen_name or user_id (logical xor) 
    assert (screen_name != None) != (user_id != None),\ 
    "Must have screen_name or user_id, but not both" 
    # See https://dev.twitter.com/docs/api/1.1/get/friends/ids and 
    # https://dev.twitter.com/docs/api/1.1/get/followers/ids for details 
    # on API parameters 
    get_friends_ids = partial(make_twitter_request, twitter_api.friends.ids, 
          count=5000) 
    get_followers_ids = partial(make_twitter_request,twitter_api.followers.ids, 
           count=5000) 
    friends_ids, followers_ids = [], [] 
    for twitter_api_func, limit, ids, label in [ 
        [get_friends_ids, friends_limit, friends_ids, "friends"], 
        [get_followers_ids, followers_limit, followers_ids, "followers"] 
       ]: 
     if limit == 0: continue 
     cursor = -1 
     while cursor != 0: 

      # Use make_twitter_request via the partially bound callable... 
      if screen_name: 
       response = twitter_api_func(screen_name=screen_name, cursor=cursor) 
      else: # user_id 
       response = twitter_api_func(user_id=user_id, cursor=cursor) 
      if response is not None: 
       ids += response['ids'] 
       cursor = response['next_cursor'] 
      print >> sys.stderr, 'Fetched {0} total {1} ids for{2}'.format(len(ids), 
                label, (user_id or screen_name)) 
      # XXX: You may want to store data during each iteration to provide 
      # an additional layer of protection from exceptional circumstances 
      if len(ids) >= limit or response is None: 
       break 
    # Do something useful with the IDs, like store them to disk... 
    return friends_ids[:friends_limit], followers_ids[:followers_limit] 

# Sample usage 
twitter_api = oauth_login() 
friends_ids, followers_ids =get_friends_followers_ids(twitter_api, 
                 screen_name="SocialWebMining", 
                 friends_limit=10, 
                 followers_limit=10) 
print friends_ids 
print followers_ids 
+0

你能否引導我通過代碼導致其不適合我的工作。 –

+0

對不起,我忘了定義2個函數:** make_twitter_request **和** oauth_login **我會編輯腳本,它對我來說非常合適! –

+0

你應該編輯你的消費者密鑰和祕密。 –

相關問題