0
我得到輸出文件中的鏈接列表,但需要將所有鏈接顯示爲絕對鏈接。有些是絕對的,有些是相對的。如何將基礎url附加到親屬以確保我只能在csv輸出中獲得絕對鏈接?無法附加基本URL以創建與Beatifulsoup的絕對鏈接Python 3
我找回所有環節,但不是所有人都絕對鏈接e.g /子頁面,而不是http://page.com/subpage
from bs4 import BeautifulSoup
import requests
import csv
j = requests.get("http://cnn.com").content
soup = BeautifulSoup(j, "lxml")
#only return links to subpages e.g. a tag that contains href
data = []
for url in soup.find_all('a', href=True):
print(url['href'])
data.append(url['href'])
print(data)
with open("file.csv",'w') as csvfile:
write = csv.writer(csvfile, delimiter = ' ')
write.writerows(data)
content = open('file.csv', 'r').readlines()
content_set = set(content)
cleandata = open('file.csv', 'w')
for line in content_set:
cleandata.write(line)