-1
我有這樣的代碼被寫入由其他人的Python 2和I它轉換到Python 3:BeautifulSoup:不是JSON序列化
url = self.lodestone_url + '/topics/'
r = self.make_request(url)
news = []
soup = bs4.BeautifulSoup(r.content)
for tag in soup.select('.news__content__list__topics li'):
entry = {}
title_tag = tag.select('.ic_topics a')[0]
script = str(tag.select('script')[0])
entry['timestamp'] = int(re.findall(r"1[0-9]{9},", script)[0].rstrip(','))
entry['link'] = '//' + self.lodestone_domain + title_tag['href']
entry['id'] = entry['link'].split('/')[-1]
entry['title'] = title_tag.string.strip()
body = tag.select('.news__content__list__topics--body')[0]
for a in body.findAll('a'):
if a['href'].startswith('/'):
a['href'] = '//' + self.lodestone_domain + a['href']
print(type(body))
entry['body'] = body.encode('utf-8').strip()
#entry['body'] = ""
entry['lang'] = 'en'
news.append(entry)
最後一塊我不能弄清楚是從上方這一行:
entry['body'] = body.encode('utf-8').strip()
因爲它給這個錯誤:
Traceback (most recent call last):
File "lodestoner", line 48, in <module>
print(json.dumps(ret, indent=4))
File "/usr/local/lib/python3.5/json/__init__.py", line 237, in dumps
**kw).encode(obj)
File "/usr/local/lib/python3.5/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/usr/local/lib/python3.5/json/encoder.py", line 427, in _iterencode
yield from _iterencode_list(o, _current_indent_level)
File "/usr/local/lib/python3.5/json/encoder.py", line 324, in _iterencode_list
yield from chunks
File "/usr/local/lib/python3.5/json/encoder.py", line 403, in _iterencode_dict
yield from chunks
File "/usr/local/lib/python3.5/json/encoder.py", line 436, in _iterencode
o = _default(o)
File "/usr/local/lib/python3.5/json/encoder.py", line 180, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: b'<div class="news__content__list__topics--body"><a class="news__content__list__topics__link_banner" href="//na.finalfantasyxiv.com/lodestone/topics/detail/f05649918007c827f44000ef5462461cec1e8b38"><img alt="" height="149" src="http://img.finalfantasyxiv.com/t/f05649918007c827f44000ef5462461cec1e8b38.png?1473152734" width="570"/></a>FINAL FANTASY XIV will be attending Tokyo Game Show 2016 at Makuhari Messe in Chiba in full force, and we\xe2\x80\x99ll be a larger than Hydaelyn presence as we\xe2\x80\x99ll be occupying space at our own Square Enix booth as well as the Intel booth! Additionally, we\xe2\x80\x99ll be broadcasting the next Letter from the Producer LIVE straight from the show floor, so be sure to mark your calendars as this is the second part of the Patch 3.4 special which you won\xe2\x80\x99t want to miss!<br><br><a href="//na.finalfantasyxiv.com/lodestone/topics/detail/f05649918007c827f44000ef5462461cec1e8b38" rel="f05649918007c827f44000ef5462461cec1e8b38">Read on</a> for more details.</br></br></div>'
is not JSON serializable
以上,body
變量是鍵入<class 'bs4.element.Tag'>
。
所以,當我刪除encode
的一部分,它看起來像這樣:
entry['body'] = body.strip()
然後我得到這個錯誤:
TypeError: 'NoneType' object is not callable
我缺少什麼?對於這樣的大多數情況,刪除encode
已經工作。
難道你只是想'進入[「身體」]'來保存新聞條目的文本內容?即'「FINAL FANTASY XIV將參加東京電玩展......」' – SuperShoot
@SuperShoot是的,我認爲這是原作者的意圖。 。 – Zeno
作爲腳本代表,您呼叫'.encode(「UTF-8」),帶()'在BS4'tag'對象的實例 - 但它們是字符串操作。嘗試'unicode(body.string)' - 根據將返回標籤中任何文本的unicode表示的文檔。 – SuperShoot