2010-12-14 45 views
1

獲得第10個谷歌的結果我需要得到的前10個谷歌搜索結果使用googleapi

例如:

... query = urllib.urlencode({'q' : 'example'}) 
... 
... url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' \ 
... % (query) 
... search_results = urllib.urlopen(url) 
... json = simplejson.loads(search_results.read()) 
... results = json['responseData']['results'] 

這會讓我在第一頁的結果,但我想要一份爲了獲得更多的谷歌結果,有誰知道如何做到這一點?

回答

3

我已經在過去做了,這裏是完整的例子(我不是Python的用戶,但它的工作原理):

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import sys, getopt 
import urllib 
import simplejson 

OPTIONS = ("m:", ["min="]) 

def print_usage(): 
    s = "usage: " + sys.argv[0] + " " 
    for o in OPTIONS[0]: 
     if o != ":" : s += "[-" + o + "] " 
    print(s + "query_string\n") 

def search(query, index, offset, min_count, quiet=False, rs=[]): 
    url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=large&%s&start=%s" % (query, offset) 
    result = urllib.urlopen(url) 
    json = simplejson.loads(result.read()) 
    status = json["responseStatus"] 
    if status == 200: 
     results = json["responseData"]["results"] 
     cursor = json["responseData"]["cursor"] 
     pages = cursor["pages"] 
     for r in results: 
      i = results.index(r) + (index -1) * len(results) + 1 
      u = r["unescapedUrl"] 
      rs.append(u) 
      if not quiet: 
       print("%3d. %s" % (i, u)) 
     next_index = None 
     next_offset = None 
     for p in pages: 
      if p["label"] == index: 
       i = pages.index(p) 
       if i < len(pages) - 1: 
        next_index = pages[i+1]["label"] 
        next_offset = pages[i+1]["start"] 
       break 
     if next_index != None and next_offset != None: 
      if int(next_offset) < min_count: 
       search(query, next_index, next_offset, min_count, quiet, rs) 
    return rs 

def main(): 
    min_count = 64 
    try: 
     opts, args = getopt.getopt(sys.argv[1:], *OPTIONS) 
     for opt, arg in opts: 
      if opt in ("-m", "--min"): 
       min_count = int(arg) 
     assert len(args) > 0 
    except: 
     print_usage() 
     sys.exit(1) 
    qs = " ".join(args) 
    query = urllib.urlencode({"q" : qs}) 
    search(query, 1, "0", min_count) 

if __name__ == "__main__": 
    main() 

編輯:,我已經修復明顯的命令行選項處理不當;您可以按如下調用該腳本:

python gsearch.py --min=5 vanessa mae 

--min開關裝置「至少5個結果」,隨後是可選的,你會得到最大允許的結果次數(64)如果不指定。

此外,爲簡潔起見,省略錯誤處理。

相關問題