Python：從Google圖片搜索下載圖片的正確網址

我在嘗試從Google圖片搜索中獲取特定查詢的圖片。但我下載的頁面沒有圖片，它將我重定向到Google的原始頁面。這是我的代碼：Python：從Google圖片搜索下載圖片的正確網址

AGENT_ID = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1" 

GOOGLE_URL = "https://www.google.com/images?source=hp&q={0}" 

_myGooglePage = "" 

def scrape(self, theQuery) : 
    self._myGooglePage = subprocess.check_output(["curl", "-L", "-A", self.AGENT_ID, self.GOOGLE_URL.format(urllib.quote(theQuery))], stderr=subprocess.STDOUT) 
    print self.GOOGLE_URL.format(urllib.quote(theQuery)) 
    print self._myGooglePage 
    f = open('./../../googleimages.html', 'w') 
    f.write(self._myGooglePage)

我在做什麼錯？

感謝

來源

2012-02-16 lorussian

至少你必須關閉文件句柄 – 2012-02-16 20:38:21

它的工作！謝謝 – lorussian 2012-02-16 20:43:26

@silviolor：我知道它不會幫助你的問題，但爲什麼不使用Python的內置'urllib2'模塊而不是'curl'。 – RanRag 2012-02-16 21:14:45

我會給你一個提示......從這裏開始：

https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=JULIE%20NEWMAR

凡朱莉和莉紐瑪是你的搜索條件。

，將返回您需要的JSON數據......你需要解析使用json.load或simplejson.load找回的字典......隨後出現跳水進去先找到responseData，然後結果列表，其中包含您將要下載的各個項目的其中的url。

雖然我不建議以任何方式進行Google的自動抓取，因爲它們的(deprecated) API因此專門說不適用。

來源

2012-02-17 00:06:24 michaelfilms

謝謝，這種方式看起來更容易。 – lorussian 2012-02-17 00:37:10

請注意，此API不再可用。 – prooffreader 2016-02-29 17:45:39

Here's a short script I wrote that does the whole deed.

來源

2012-05-27 23:29:36 crizCraig

你好，你的腳本似乎在使用PIL。不幸的是，我在這臺機器上安裝PIL似乎有巨大的問題。既然我只是需要圖像，而不以任何方式改變它們，有沒有辦法讓它脫身呢？ – 2012-07-08 10:18:36

我不確定如何避免PIL，但如果您使用Mac來簡化軟件包安裝併爲您安裝PIL，我強烈建議使用MacPorts。 – crizCraig 2012-07-09 20:07:06

或更好，自制軟件：http://brew.sh/ – 2013-09-01 16:27:37

這是在Python代碼，我用它來搜索和谷歌從下載圖像，希望它有助於：

import os 
import sys 
import time 
from urllib import FancyURLopener 
import urllib2 
import simplejson 

# Define search term 
searchTerm = "hello world" 

# Replace spaces ' ' in search term for '%20' in order to comply with request 
searchTerm = searchTerm.replace(' ','%20') 


# Start FancyURLopener with defined version 
class MyOpener(FancyURLopener): 
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11' 
myopener = MyOpener() 

# Set count to 0 
count= 0 

for i in range(0,10): 
    # Notice that the start changes for each iteration in order to request a new set of images for each loop 
    url = ('https://ajax.googleapis.com/ajax/services/search/images?' + 'v=1.0&q='+searchTerm+'&start='+str(i*4)+'&userip=MyIP') 
    print url 
    request = urllib2.Request(url, None, {'Referer': 'testing'}) 
    response = urllib2.urlopen(request) 

    # Get results using JSON 
    results = simplejson.load(response) 
    data = results['responseData'] 
    dataInfo = data['results'] 

    # Iterate for each result and get unescaped url 
    for myUrl in dataInfo: 
     count = count + 1 
     print myUrl['unescapedUrl'] 

     myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg') 

    # Sleep for one second to prevent IP blocking from Google 
    time.sleep(1)

您還可以找到非常有用的信息here。

來源

2012-11-24 07:33:12

是否可以在給定的網址上定義圖片類型爲Google – erogol 2014-08-09 09:11:47

我暫時沒有看這個，但查看最新的Google API。我認爲答案是肯定的，您可以將搜索結果細化爲「.png」，「.jpg」，甚至是基於矢量的格式「.svg」。 – 2014-08-09 17:41:29

我只是在回答這個問題，儘管它很古老。有一個更簡單的方法去做這件事。

就是這樣。

來源

2013-09-11 19:26:54 riyoken

這是3.x，所以用2.x中的urllib2替換urllib.request顯然。 – riyoken 2013-09-11 19:28:12

Python：從Google圖片搜索下載圖片的正確網址

回答

相關問題