「unicode」對象沒有「美化」屬性

我正在使用BeautifulSoup來解析html文章。我使用一些函數來清除html，所以我只能保留主要文章。「unicode」對象沒有「美化」屬性

此外，我想將湯輸出保存到一個文件。我得到的錯誤是：

soup = soup.prettify("utf-8") 
AttributeError: 'unicode' object has no attribute 'prettify'

源代碼：

#!/usr/bin/env python 
import urllib2 
from bs4 import BeautifulSoup 
import nltk 
import argparse 

def cleaner(): 
    url = "https://www.ceid.upatras.gr/en/announcements/job-offers/full-stack-web-developer-papergo" 
    ourUrl = urllib2.urlopen(url).read() 
    soup = BeautifulSoup(ourUrl) 

    #remove scripts 
    for script in soup.find_all('script'): 
     script.extract() 
    soup = soup.find("div", class_="clearfix") 

    #below code will delete tags except /br 
    soup = soup.encode('utf-8') 
    soup = soup.replace('<br/>' , '^') 
    soup = BeautifulSoup(soup) 
    soup = (soup.get_text()) 
    soup=soup.replace('^' , '<br/>') 

    print soup 
    with open('out.txt','w',encoding='utf-8-sig') as f: 
     f.write(soup.prettify()) 

if __name__ == '__main__': 
    cleaner()

來源

2016-12-26 Fotis455

這是因爲soup不是這些行後面再一個BeautifulSoup或Tag例如：

soup = (soup.get_text()) 
soup = soup.replace('^' , '<br/>')

它成爲一個unicode字符串，當然，它沒有.prettify()方法。

根據您所需的輸出是什麼，你應該能夠使.get_text()，.replace_with()，.unwrap()，.extract()等BeautifulSoup方法的清理你的HTML，而不是試圖對付它作爲一個普通字符串。

來源

2016-12-26 00:48:18 alecxe

「unicode」對象沒有「美化」屬性

回答

相關問題