2017-05-02 59 views
0

我對圖片我試過的Python 3.6圖像從谷歌圖片搜索</p> <p>crwaling

1.Open的鍍鉻驅動器與硒

2.向下滾動到結束

3。使用BeautifulSoup獲取圖片網址並保存圖片

但這是一個問題,因爲圖片太小

所以,我發現有SRC

它是在src原始圖像的圖像irc_mi類

的(以「.jpg」結尾),但我不知道如何將其拉出

我嘗試使用find_all作爲類名,但失敗了。

我該怎麼辦?

這裏是源代碼

def Remainder_All_ImagesURLs_Google(searchText): 

def scroll_page(): 
    for i in range(7): 
     driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") 
     sleep(3) 

def click_button(): 
    more_imgs_button_xpath = "//*[@id='smb']" 
    element = driver.find_element_by_xpath(more_imgs_button_xpath) 
    element.click() 
    sleep(3) 


def create_soup(): 
    html_source = driver.page_source 
    soup = BeautifulSoup(html_source, 'html.parser') 
    return soup 


def find_imgs(): 
    soup = create_soup() 
    imgs_urls = [] 
    for img in soup.find_all('img'): 
     try: 
      if img['src'].startswith('http'): 
       imgs_urls.append(img['src']) 
     except: 
      pass 

    return imgs_urls 


driver = webdriver.Chrome('C:/chromedriver.exe') 

driver.maximize_window() 
sleep(2) 


searchUrl = "https://www.google.com/search?q={}&site=webhp&tbm=isch".format(searchText) 


driver.get(searchUrl) 

try: 
    scroll_page() 
    click_button() 
    scroll_page() 


except: 
    click_button() 
    scroll_page() 

imgs_urls = find_imgs() 

driver.close() 

return(imgs_urls) 

def download_image(url,filename): 
    full_name = str(filename) + ".jpg" 
    urllib.request.urlretrieve(url, 'C:/Python/Picture' + full_name) 

回答

0

問題是美麗的湯不會找到,因爲它的一個java腳本基於功能的來源或圖像的HREF返回源(SRC),因此我的建議使用硒點擊圖片標籤,等待圖像src和解壓 使用

element=driver.find_element_by_class_name("some_class") 
element.click() 

然後搜索圖片src