2017-07-14 47 views
0

這裏相同的內容是我的代碼:獲得在所有CSV文件

wardName = ["DHANLAXMICOMPLEX", "POTALIYA", "ARJUN TOWER", "IIM"] 

def get_all_pages(): 

    global wardName 
    list = [] 
    url = 'https://recruitment.advarisk.com/tests/scraping' 
    client = requests.session() 
    tree = html.fromstring(client.get(url).content) 
    csrf = tree.xpath('//input[@name="csrf_token"]/@value')[0] 
    for i in wardName: 
     formData = dict(csrf_token=csrf, ward=i) 
     headers = {'referer': url, 'content-type': 'application/x-www-form-urlencoded', 'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'} 
     r = client.post(url, data=formData, headers=headers) 
     list.append(r.content) 
    return list 
def parse_and_write_to_csv(htmls): 
    global wardName 
    parse = html.fromstring(htmls) 
    th = parse.xpath("//table[@id='results']/thead//th//text()") 
    soup = BeautifulSoup(htmls, "html.parser") 
    table = soup.select_one("#results") 
    for i in wardName: 
     name = str(i) + '.csv' 
     with open(name, 'w') as fw: 
      writer = csv.writer(fw) 
      writer.writerow(th) 
      writer.writerows([[j.text for j in i.find_all("td")] for i in table.select("tr + tr")]) 
def main(): 
    for value in get_all_pages(): 
     parse_and_write_to_csv(value) 

if __name__ == '__main__': 
    main() 

但你可以看到所有的CSV文件包含最後IIM頁面相同的內容。我想讓每個CSV文件根據其名稱獲取內容。我應該怎麼做才能獲得正確的CSV?我哪裏錯了?

+2

「你可以看到」 ......不,我不認爲這 –

+0

兄弟,它的創建具有相同的內容,因爲沒有選項上傳CSV文件,因此,我的CSV文件」 t告訴你, – Trunks

+1

Csv是明文..複製並粘貼內容 –

回答

0

for i in wardNameswriter.writerow用法從未改變迭代

你需要的,如果你想有不同的CSV內容

th = parse.xpath("//table[@id='results']/thead//th//text()") 
soup = BeautifulSoup(htmls, "html.parser") 
table = soup.select_one("#results") 

一到這些線路進入該循環並改變它們之間的內容建議將wardName添加到其結果中

list.append((i, r.content)) 

和環比那些

for ward, page in get_all_page(): 
    write_to_csv(ward, page) 

並再次重新定義你的函數不是循環在病房

def write_to_csv(ward,page): 
    parse = html.fromstring(page) 
    th = parse.xpath("//table[@id='results']/thead//th//text()") 
    soup = BeautifulSoup(page, "html.parser") 
    table = soup.select_one("#results") 
    with open (ward+'.csv', 'w') as f: 
     # write csv 
另一個建議是刪除全局列表。

def get_page(ward): 
    pass 
def write_ward_csv(ward, ward_html): 
    pass 

for ward in [ ... ]: 
    write_ward_csv(ward, get_page(ward)) 
+0

謝謝so muchhhh – Trunks

+0

我已經更新了帖子,我怎麼做 –

+0

非常感謝你 – Trunks