2017-07-03 54 views
0

如何迭代和刮取本網站中的每個現有行:https://icostats.com/對網站表格行進行迭代並提取數據

是否有可能用類似下面的代碼來做到這一點?

rows = [] 
for row in rows(0, 20): 
    row += 1 
    get_css_sel("#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child({})").format(row) 

全碼:

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait as wait 

def get_css_sel(selector): 
    posts = browser.find_elements_by_css_selector(selector) 
    for post in posts: 
     print(post.text) 

browser = webdriver.Chrome(executable_path=r'C:\Users\alph1\Scrapers\chromedriver.exe') 
browser.get("https://icostats.com") 
wait(browser, 40).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child(2) > div:nth-child(8)"))) 

browser.execute_script(''' 
    var element = document.getElementsByClassName("buyNow-0-81"), index; 
    for (index = element.length - 1; index >= 0; index--) { 
    element[index].parentNode.removeChild(element[index]); 
    } 
''') 

get_css_sel("#app > div > div.container-0-16 > div.table-0-20 > div.tableheader-0-50")    #fetch header of table 

rows = [] 
for row in rows(0, 20): 
    row += 1 
    get_css_sel("#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div:nth-child({})").format(row) 

回答

1

忘記循環,只是做:

get_css_sel("#app > div > div.container-0-16 > div.table-0-20 > div.tbody-0-21 > div") 
+0

哇。我完全錯過了那個標誌。 Python非常漂亮。 – tklein

+0

如果我要將每個get_css_sel()寫入一個csv文件的行中,那麼我需要一個循環,對吧? – tklein

+0

@tklein是的,你仍然需要「在帖子中發帖」,而不是「在行中排」。 –