單頁結果返回結果的最大數量爲100.爲了獲得所有結果,您需要使用響應中包含的next_page
URL進行「頁面」瀏覽(請參閱here爲文檔)。然後,您可以遍歷響應,調用每個參數的next_page
參數,直到參數不再存在(表示您已收集所有結果)。
import json
import urllib
import urllib2
# General query stub
url_stub = 'http://search.twitter.com/search.json'
# Parameters to pass
params = {
'q': 'tennis',
'rpp': 100,
'result_type': 'mixed'
}
# Variable to store our results
results = []
# Outside of our loop, we pull the first page of results
# The '?' is included in the 'next_page' parameter we receive
# later, so here we manually add it
resp = urllib2.urlopen('{0}?{1}'.format(url_stub, urllib.urlencode(params)))
contents = json.loads(resp.read())
results.extend(contents['results'])
# Now we loop until there is either no longer a 'next_page' variable
# or until we max out our number of results
while 'next_page' in contents:
# Print some random information
print 'Page {0}: {1} results'.format(
contents['page'], len(contents['results']))
# Capture the HTTPError that will appear once the results have maxed
try:
resp = urllib2.urlopen(url_stub + contents['next_page'])
except urllib2.HTTPError:
print 'No mas'
break
# Load our new contents
contents = json.loads(resp.read())
# Extend our results
results.extend(contents['results'])
# Print out how many results we received
print len(results)
輸出:
Page 1: 103 results
Page 2: 99 results
Page 3: 100 results
Page 4: 100 results
Page 5: 100 results
Page 6: 100 results
Page 7: 100 results
Page 8: 99 results
Page 9: 98 results
Page 10: 95 results
Page 11: 100 results
Page 12: 99 results
Page 13: 99 results
Page 14: 100 results
Page 15: 100 results
No mas
1492