-1
我想用Reddit評論做一些文本分析。該腳本我有當前打印出身體,給予好評指望一個給定版(Subreddit)的「熱」的帖子所有評論超過5個upvotes:整潔打印評論通過Python的Reddit API
import praw
reddit = praw.Reddit(client_id=ID,
client_secret=SECRET, password=PWORD,
user_agent=UAGENT, username=UNAME)
subreddit = reddit.subreddit('cryptocurrency')
for submission in subreddit.hot(limit=10):
submission.comments.replace_more(limit=10)
for comment in submission.comments.list():
submission.comment_sort = 'top'
if comment.ups > 5:
print(comment.body, comment.ups)
但是,輸出是這個樣子:
(u'Just hodl and let the plebs lose money on scamcoin ICO\'s that don\'t even have a working product. I don\'t understand some of these "traders" and "investors".', 9)
(u"Good idea imho but it's gonna be abused af. Think about it. It will be the sexual go to app real soon. If they will 'ban' nudity on it, then you will simply get the instagram chicks on there with all the horny guys liking their photos and giving them free money. 'if this gets 1000 likes I will post a pic of me in bikini' ", 7)
(u"But but but, I just sold a kidney and bought in at the top, now I can't afford to get the stitches removed!\n\n/s just in case.", 7)
兩個問題:
- 有什麼辦法可以使用python將輸出轉換爲JSON嗎?
- 如果不是,我怎樣才能擺脫除身體以外的所有多餘字符和upvote計數?
我的最終目標是讓這個輸出整齊地組織起來,這樣我就可以分析關鍵字與upvote count(哪些關鍵字獲得最多upvotes等)。
謝謝!