2012-07-11 14 views
0

目前,我有很多推文,並且我將把它們存儲在實驗室中的服務器上。不過,我有一個問題需要確定我打算怎麼做。使用PyMongo存儲Twitter流式API的JSON字典

例如,鳴叫具有以下格式:

{ 
    "contributors": null, 
    "coordinates": null, 
    "created_at": "Tue Jul 10 17:09:12 +0000 2012", 
    "entities": { 
     "hashtags": [{ 
      "indices": [62, 78], 
      "text": "thestrongnation" 
     }], 
     "urls": [], 
     "user_mentions": [{ 
      "id": 376483630, 
      "id_str": "376483630", 
      "indices": [0, 8], 
      "name": "SherryHonig", 
      "screen_name": "sahonig" 
     }] 
    }, 
    "favorited": false, 
    "geo": null, 
    "id": 222739261219282945, 
    "id_str": "222739261219282945", 
    "in_reply_to_screen_name": "sahonig", 
    "in_reply_to_status_id": 222695060528037889, 
    "in_reply_to_status_id_str": "222695060528037889", 
    "in_reply_to_user_id": 376483630, 
    "in_reply_to_user_id_str": "376483630", 
    "place": { 
     "attributes": {}, 
     "bounding_box": { 
      "coordinates": [ 
       [ 
        [-106.645646, 25.837164000000001], 
        [-93.508038999999997, 25.837164000000001], 
        [-93.508038999999997, 36.500703999999999], 
        [-106.645646, 36.500703999999999] 
       ] 
      ], 
      "type": "Polygon" 
     }, 
     "country": "United States", 
     "country_code": "US", 
     "full_name": "Texas, US", 
     "id": "e0060cda70f5f341", 
     "name": "Texas", 
     "place_type": "admin", 
     "url": "http://api.twitter.com/1/geo/id/e0060cda70f5f341.json" 
    }, 
    "retweet_count": 0, 
    "retweeted": false, 
    "source": "web", 
    "text": "@sahonig BOOM !!!! I feel a 1 coming on!!! Awesome! #thestrongnation", 
    "truncated": false, 
    "user": { 
     "contributors_enabled": false, 
     "created_at": "Wed Feb 15 14:40:48 +0000 2012", 
     "default_profile": false, 
     "default_profile_image": false, 
     "description": "Living life on 30A & doing it my way. My mind is Stronger than physical challenge. Runner, Crosfit, Fitness Challenges. Proud member of #thestrongnation. ", 
     "favourites_count": 17, 
     "follow_request_sent": null, 
     "followers_count": 215, 
     "following": null, 
     "friends_count": 184, 
     "geo_enabled": true, 
     "id": 493181025, 
     "id_str": "493181025", 
     "is_translator": false, 
     "lang": "en", 
     "listed_count": 4, 
     "location": "Seagrove Beach, FL", 
     "name": "30A My Way \u2600", 
     "notifications": null, 
     "profile_background_color": "c0deed", 
     "profile_background_image_url": "http://a0.twimg.com/profile_background_images/590670431/aj7p0c6j2oevdj240jz2.jpeg", 
     "profile_background_image_url_https": "https://si0.twimg.com/profile_background_images/590670431/aj7p0c6j2oevdj240jz2.jpeg", 
     "profile_background_tile": true, 
     "profile_image_url": "http://a0.twimg.com/profile_images/2381704869/b7bizspexjgmyspqesg0_normal.jpeg", 
     "profile_image_url_https": "https://si0.twimg.com/profile_images/2381704869/b7bizspexjgmyspqesg0_normal.jpeg", 
     "profile_link_color": "0084B4", 
     "profile_sidebar_border_color": "C0DEED", 
     "profile_sidebar_fill_color": "DDEEF6", 
     "profile_text_color": "333333", 
     "profile_use_background_image": true, 
     "protected": false, 
     "screen_name": "30A_MyWay", 
     "show_all_inline_media": false, 
     "statuses_count": 1731, 
     "time_zone": "Central Time (US & Canada)", 
     "url": null, 
     "utc_offset": -21600, 
     "verified": false 
    } 
} 

這是,當然,在Python字典,這恰好跟隨JSON格式。 MongoDB方便地接受這些JSON格式,但事情是,我不想所有提供的信息。 Streaming API爲我提供了20個字段,當時我真的只想混淆userid,text和location。我最初打算通過這個解析並提取我想要的文本,但是我找不到可靠的解析器,並且考慮到正在開發的條件,我覺得寫一個會浪費時間。

但是,我正在考慮的另一個解決方案是,因爲這些正在讀入MongoDB,所以我可能只在字典中存儲我想要的內容並擺脫其餘部分。提出的唯一問題是,Twitter收到的文件格式將所有字典放在同一行 - 我覺得不管怎樣我都必須進行某種提取。

還有其他人有什麼建議嗎?

+1

用於與pymongo這裏蟒的示例代碼http://stackoverflow.com/questions/10855518/optimization-dumping-json-from-a-streaming-api-to-mongo/10865813#10865813應該會有很大幫助 – 2012-07-11 22:43:09

回答

1

如果你有,你可以使用json.loads(這將返回dictlist S作爲上述格式)取結果,並把它轉換爲Python結構如果沒有準備好,因此它可以被操縱。 (但是,人們通常會使用一些Python的Twitter庫,將做到這一點透明)

只需創建你想要的數據的新dict並插入到MongoDB的,如:

假設ret =鳴叫響應如上

mydata = { 
    'name': ret['user']['screen_name'], 
    'text': ret['text'] 
} 

print mydata['name'], 'wrote', mydata['text'] # or something 

# insert mydata into appropriate MongoDB DB/collection here