2017-07-15 72 views
0

我想獲得subreddit的內容和評論,並將它們寫入到txt文件。 一個文件將是每個帖子的評論,另一個將列出每個帖子的相關信息。 但是,我在7250結果後得到了這些錯誤,並且我需要得到36k +結果。python reddit praw psraw得到解碼json值錯誤

我也使用praw 4.6,因爲更新到5.0之後,psraw無法正常工作。

錯誤消息:

Traceback (most recent call last): 
    File "C:/Users/PycharmProjects/untitled/subreddit psraw.py", line 12, in <module> 
    for submission in psraw.submission_search(reddit, subreddit='R', limit=40000): 
    File "C:\Python27\lib\site-packages\psraw\base.py", line 71, in endpoint_func 
    data = requests.get(url).json()['data'] 
    File "C:\Python27\lib\site-packages\requests\models.py", line 894, in json 
    return complexjson.loads(self.text, **kwargs) 
    File "C:\Python27\lib\json\__init__.py", line 339, in loads 
return _default_decoder.decode(s) 
    File "C:\Python27\lib\json\decoder.py", line 364, in decode 
obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 
    File "C:\Python27\lib\json\decoder.py", line 382, in raw_decode 
raise ValueError("No JSON object could be decoded") 
ValueError: No JSON object could be decoded 

我的代碼:

import praw, datetime, os, psraw 

reddit = praw.Reddit('bot1') 

subreddit = reddit.subreddit('R') 

count = 0 
try: 
    for submission in psraw.submission_search(reddit, subreddit='R', limit=40000): 
    count_coment = 0 

    #get comments 
    for comment in submission.comments: 
     subid = submission.id 
     comid = comment.id 
     comauthor = comment.author 
     com_body = comment.body.encode('utf-8').replace("\n", " ") 
     comscore = comment.score 
     com_date = datetime.datetime.utcfromtimestamp(comment.created_utc) 
     string_com = '"{0}", "{1}", "{2}", "{3}", "{4}"\n' 
     formatted_string_com = string_com.format(comid, comauthor, com_body, com_date, comscore) 
     indexFile_comment = open('C:/Users/PycharmProjects/untitled/reddit_output_diabetes/' + subid + '.txt', 'a+') 
     indexFile_comment.write(formatted_string_com) 
     count_coment += 1 
    print 'comment count: ', count_coment 

    #get index 

    date = datetime.datetime.utcfromtimestamp(submission.created_utc) 
    _id = submission.id 
    title = submission.title.encode('utf-8') 
    text = submission.selftext.encode('utf-8').replace("\n", " ") 
    author = submission.author 
    score = submission.score 
    string = '"{0}", "{1}", "{2}", "{3}", "{4}", "{5}"\n' 

    formatted_string = string.format(_id, title, text, author, date, score) 
    count += 1 
    indexFile = open('C:/Users/PycharmProjects/untitled/reddit_output/' + 'index.txt', 'a+') 
    indexFile.write(formatted_string) 

    print ("Successfuly writing in file") 
    print count 
    indexFile.close() 
    print count 
except ValueError: 
    pass 

回答

0

它應該是:

try: 

.......put code here... 

except ValueError: 
    pass 
    continue 
+0

如果您繼續使用,則不必通過。對不起,遲到的回覆,我有麻煩連接到互聯網。 –

0

這可能是在解析特定註釋錯誤。您可以跳過此評論,並通過嘗試處理它來轉到下一個評論,除外。

將代碼放在:

try: 

.......put code here... 

except ValueError: 
    pass 
+0

把代碼從第一個for循環。 –

+0

我把嘗試除了,但代碼只是卡在開始,並沒有運行 –

+0

讓我改說,它的運行,但沒有返回任何結果。 –