想要返回<title>標記，但返回<title>錯誤請求<.title> Python 3

在python中，我有一個程序從URL列表中返回。想要返回<title>標記，但返回<title>錯誤請求<.title> Python 3

有些人返回時，他們被一起放在一個列表

例如我裝載兩個URL到到文本文件錯誤請求：

http://www.scientific.net/MSF 
http://www.scientific.net/JMNM

它返回：

<title>Bad Request</title> 
<title>Journal of Metastable and Nanocrystalline Materials</title>

如果我只有列表中的第一個URL，代碼就可以正常工作。我如何讓它檢索標題而不是錯誤的請求？

我的代碼：

url_list= [] 

f = open('test.txt','r') #text file with url 
for line in f: 
    url_list.append(line) 

for link in url_list: 
    try: 
     r = requests.get(link) 
     soup = BeautifulSoup(r.content,"html.parser") 
     title = soup.title 
     title.string = title.get_text(strip = True) 
     print(str(title)) 

    except: 
     print("No Title Found ") 
     continue

來源

2017-02-20 Kay

您的問題是由於從文本文件中讀取而引起的。在for link in url_list循環中，您的第一個值link將爲http://www.scientific.net/MSF\n - \n最終導致Bad Request錯誤。在讀取它們時，從行中剝離\n，並且您的代碼將起作用。看起來你的最後一行沒有\n，所以僅僅使用url_list.append(line[:-1])將會失敗。

來源

2017-02-20 04:23:57 VBB

r = requests.get(link) 
    soup = BeautifulSoup(r.content,"html.parser") 
    #title = soup.title 
    titles = soup.find_all('title') 
    for title in titles: 
     title.string = title.get_text(strip = True) 
     print(str(title))

.是快捷方式.find()，它將返回第一場比賽，你應該使用find_all()返回所有配襯。

來源

2017-02-20 03:59:51

想要返回<title>標記，但返回<title>錯誤請求<.title> Python 3

回答

相關問題