0

我正在從一個txt文件中取出URL並將其導出到一個csv文件。但畢竟這個過程我的代碼只寫入最後一個url的信息。我的猜測是我忘記了一個循環。但是哪裏? 這裏是我的代碼:熊貓只寫在CSV文件中的最後一行

import requests 
from bs4 import BeautifulSoup 
import pandas as pd 
from urllib import urlopen 

file = open('urls.txt', 'r') 
filelines = (line.strip() for line in file) 
for code in filelines: 
    site = urlopen(code) 
    soup = BeautifulSoup(site, "html.parser") 
    final = soup.find_all("span", {"class": "bd js-title-main-info"}) 
    print final 

records = [] 
for pagetxt in final: 
    print pagetxt.text 
    records.append((pagetxt.text)) 
df = pd.DataFrame(records, columns=['product name']) 
df.to_csv('test.csv', index=False, encoding='utf-8') 

感謝

回答

1

當你從文件中的數據您保持變量final只有最後一個值。嘗試追加數據較早(我已標記更改爲#####):

import requests 
from bs4 import BeautifulSoup 
import pandas as pd 
from urllib import urlopen 

file = open('urls.txt', 'r') 
filelines = (line.strip() for line in file) 
records = []       ###### 
for code in filelines: 
    site = urlopen(code) 
    soup = BeautifulSoup(site, "html.parser") 
    final = soup.find_all("span", {"class": "bd js-title-main-info"}) 
    print final 

    for pagetxt in final:    ###### 
     print pagetxt.text    ###### 
     records.append((pagetxt.text)) ###### 

df = pd.DataFrame(records, columns=['product name']) 
df.to_csv('test.csv', index=False, encoding='utf-8') 
+1

是的,它的工作!謝謝! – Jodmoreira