因此,我的目標是解析網站中的數據並將這些數據存儲在格式化爲可在Excel中打開的文本文件中。 下面是代碼:用BeautifulSoup解析數據並用熊貓數據存儲DataFrame to_csv
from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip
import json
import pandas as pd
import csv
pag = range (2,126)
out_file=open('bestumbrellasoffi.txt','w',encoding='utf-8')
with open('bestumbrellasoffi.txt','w', encoding='utf-8') as file:
for x in pag:
#iterate pages
url = 'https://www.paginegialle.it/ricerca/lidi%20balneari/italia/p-
'+str(x)+'?mr=50'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
#parse data
for i,j,k,p,z in zip(soup.find_all('span', attrs=
{'itemprop':'name'}),soup.find_all('span', attrs=
{'itemprop':'longitude'}),soup.find_all('span', attrs=
{'itemprop':'latitude'}),soup.find_all('span', attrs={'class':'street-
address'}), soup.find_all('div', attrs={'class':'tel elementPhone'})):
info=i.text,j.text,k.text,p.text,z.text
#check if data is good
print(url)
print (info)
#create dataframe
raw_data = { 'nome':[i],'longitudine':[j],'latitudine':
[k],'indirizzo':[p],'telefono':[z]}
print(raw_data)
df=pd.DataFrame(raw_data, columns =
['nome','longitudine','latitudine','indirizzo','telefono'])
df.to_csv('bestumbrellasoffi.txt')
out_file.close()
有所有這些模塊,因爲我做了很多嘗試。 所以 打印(信息)的輸出is
打印(raw_data)is
EDIT
此的輸出是審查並充分發揮功能的代碼。
感謝所有耐心等待!
from bs4 import BeautifulSoup
導入請求 進口pprint 進口重新 進口pyperclip 進口JSON 進口熊貓作爲PD 導入CSV
PAG =範圍(2126) 張開( 'bestumbrellasoffia.txt',」 a',encoding ='utf-8')作爲文件:
for x in pag:
#iterate pages
url = 'https://www.paginegialle.it/ricerca/lidi%20balneari/italia/p-'+str(x)+'?mr=50'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
raw_data = { 'nome':[],'longitudine':[],'latitudine':[],'indirizzo':[],'telefono':[]}
df=pd.DataFrame(raw_data, columns = ['nome','longitudine','latitudine','indirizzo','telefono'])
#parse data
for i,j,k,p,z in zip(soup.find_all('span', attrs={'itemprop':'name'}),soup.find_all('span', attrs={'itemprop':'longitude'}),soup.find_all('span', attrs={'itemprop':'latitude'}),soup.find_all('span', attrs={'class':'street-address'}), soup.find_all('div', attrs={'class':'tel elementPhone'})):
inno=i.text.lstrip()
ye=inno.rstrip()
info=ye,j.text,k.text,p.text,z.text
#check if data is good
print(info)
#create dataframe
raw_data = { 'nome':[i],'longitudine':[j],'latitudine':[k],'indirizzo':[p],'telefono':[z]}
#try dataframe
#print(raw_data)
file.write(str(info)+"\n")
歡迎來到SO。什麼是問題?請花時間閱讀[問]及其中包含的鏈接。 – wwii
謝謝@wwii,對不起,我不清楚 –