2012-12-30 36 views
2

出於某種原因,我只從此代碼獲得100推文。根據Twitter的API,我相信我應該越來越1500.我似乎得到了一個不正確的推特計數與我的代碼

我在這裏做錯了什麼?

具體討論的問題是:

twiturl = "http://search.twitter.com/search.json?q=" + urlinfo + "&rpp=99&page=15" + "&since_id=" + str(tweetdate) 

for x in arg1: 
         urlinfo = x[2] 
         idnum = int(x[1]) 
         name = x[0] 
         twiturl = "http://search.twitter.com/search.json?q=" + urlinfo + "&rpp=99&page=15" + "&since_id=" + str(tweetdate) 
         response = urllib2.urlopen(twiturl) 
         twitseek = simplejson.load(response) 
         twitsearch = twitseek['results'] 
         tweets = [x['text'] for x in twitsearch] 
         tweetlist = [tweets, name] 
         namelist.append(tweetlist) 

,應該是X中的項目[2]僅僅是一個單詞或短語,如「我」或「我感覺」變成一個url友好編碼

回答

2

單頁結果返回結果的最大數量爲100.爲了獲得所有結果,您需要使用響應中包含的next_page URL進行「頁面」瀏覽(請參閱here爲文檔)。然後,您可以遍歷響應,調用每個參數的next_page參數,直到參數不再存在(表示您已收集所有結果)。

import json 
import urllib 
import urllib2 


# General query stub 
url_stub = 'http://search.twitter.com/search.json' 

# Parameters to pass 
params = { 
    'q': 'tennis', 
    'rpp': 100, 
    'result_type': 'mixed' 
    } 

# Variable to store our results 
results = [] 

# Outside of our loop, we pull the first page of results 
# The '?' is included in the 'next_page' parameter we receive 
# later, so here we manually add it 
resp = urllib2.urlopen('{0}?{1}'.format(url_stub, urllib.urlencode(params))) 
contents = json.loads(resp.read()) 
results.extend(contents['results']) 

# Now we loop until there is either no longer a 'next_page' variable 
# or until we max out our number of results 
while 'next_page' in contents: 

    # Print some random information 
    print 'Page {0}: {1} results'.format(
     contents['page'], len(contents['results'])) 

    # Capture the HTTPError that will appear once the results have maxed 
    try: 
    resp = urllib2.urlopen(url_stub + contents['next_page']) 
    except urllib2.HTTPError: 
    print 'No mas' 
    break 

    # Load our new contents 
    contents = json.loads(resp.read()) 

    # Extend our results 
    results.extend(contents['results']) 

# Print out how many results we received 
print len(results) 

輸出:

Page 1: 103 results 
Page 2: 99 results 
Page 3: 100 results 
Page 4: 100 results 
Page 5: 100 results 
Page 6: 100 results 
Page 7: 100 results 
Page 8: 99 results 
Page 9: 98 results 
Page 10: 95 results 
Page 11: 100 results 
Page 12: 99 results 
Page 13: 99 results 
Page 14: 100 results 
Page 15: 100 results 
No mas 
1492 
3

Twitter Search API狀態的文檔:

RPP(可選):微博的數量每頁返回,最多的一個最大 100.

(可選):頁碼(從1開始)返回,由最高大約1500結果 (基於RPP *頁)。

因此,你應該讓多個請求,每個最多可容納100個鳴叫爲每個請求不同的頁號:

import urllib, json 

twiturl = "http://search.twitter.com/search.json?q=%s&rpp=99&page=%d" 

def getmanytweets(topic): 
    'Return a list of upto 1500 tweets' 
    results = [] 
    for page in range(1, 16): 
     u = urllib.urlopen(twiturl % (topic, page)) 
     data = u.read() 
     u.close() 
     t = json.loads(data) 
     results += t['results'] 
    return results 

if __name__ == '__main__': 
    import pprint 
    pprint.pprint(getmanytweets('obama')) 
相關問題