2016-06-26 74 views
-1

我有一個關於抓取instagram關注者頁面的問題。我有一個代碼,但它只顯示9個關注者。請幫助我。使用硒和python刮掉Instagram關注者頁面

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 


def login(driver): 
    username = "[email protected]" # <username here> 
    password = "xxxx" # <password here> 

    # Load page 
    driver.get("https://www.instagram.com/accounts/login/") 

    # Login 
    driver.find_element_by_xpath("//div/input[@name='username']").send_keys(username) 
    driver.find_element_by_xpath("//div/input[@name='password']").send_keys(password) 
    driver.find_element_by_xpath("//span/button").click() 

    # Wait for the login page to load 
    WebDriverWait(driver, 15).until(
     EC.presence_of_element_located((By.LINK_TEXT, "See All"))) 


def scrape_followers(driver, account): 
    # Load account page 
    driver.get("https://www.instagram.com/{0}/".format(account)) 

    # Click the 'Follower(s)' link 
    driver.find_element_by_partial_link_text("follower").click() 

    # Wait for the followers modal to load 
    xpath = "//div[@style='position: relative; z-index: 1;']/div/div[2]/div/div[1]" 
    WebDriverWait(driver, 10).until(
     EC.presence_of_element_located((By.XPATH, xpath))) 

    # You'll need to figure out some scrolling magic here. Something that can 
    # scroll to the bottom of the followers modal, and know when its reached 
    # the bottom. This is pretty impractical for people with a lot of followers 

    # Finally, scrape the followers 
    xpath = "//div[@style='position: relative; z-index: 1;']//ul/li/div/div/div/div/a" 
    followers_elems = driver.find_elements_by_xpath(xpath) 

    return [e.text for e in followers_elems] 


if __name__ == "__main__": 
    driver = webdriver.Firefox() 
    try: 
     login(driver) 
     followers = scrape_followers(driver, "instagram") 
     print(followers) 
    finally: 
     driver.quit() 

此代碼來自另一頁。我不明白如何向下滾動關注者頁面。

+0

這裏是我的谷歌:selenium scroll – wgwz

回答

1

您可以通過增加scrollTop方便地使用javascript向下滾動。您運行此滾動,直到列表中的用戶數量不再發生變化。

在用戶量的差異可以用下面的函數

count = 0 

def check_difference_in_count(driver): 
    global count 

    new_count = len(driver.find_elements_by_xpath("//div[@role='dialog']//li")) 

    if count != new_count: 
     count = new_count 
     return True 
    else: 
     return False 

而下面的腳本向下滾動用戶容器中,直到它到達被檢查底部

while 1: 
    # scroll down 
    driver.execute_script("document.querySelector('div[role=dialog] ul').parentNode.scrollTop=1e100") 

    try: 
     WebDriverWait(driver, 5).until(check_difference_in_count) 
    except: 
     break 
1

你必須爲循環添加一個,以便您可以向下滾動追隨者的頁面。這個for循環可以是這樣的:

#Find the followers page 
dialog = driver.find_element_by_xpath('/html/body/div[2]/div/div[2]/div/div[2]') 
#find number of followers 
allfoll=int(driver.find_element_by_xpath("//li[2]/a/span").text) 
#scroll down the page 
for i in range(int(allfoll/2)): 
    driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", dialog) 
    time.sleep(random.randint(500,1000)/1000) 
    print("Extract friends %",round((i/(allfoll/2)*100),2),"from","%100")