2016-08-30 54 views
0

我已經有了一個很好的Python腳本,它可以打印出來自給定用戶名的過去200條推文。Python:如何搜索推文並在數據庫中存儲?

但是,我想對其進行修改,以便它會收集過去200條包含特定哈希標籤(來自任何用戶名)的推文,然後我想將這些結果存儲在數據庫中。

任何人都可以提供關於如何修改下面的代碼的建議嗎?

import sys 
import operator 
import requests 
import json 
import twitter 

twitter_consumer_key = 'XXXX' 
twitter_consumer_secret = 'XXXX' 
twitter_access_token = 'XXXX' 
twitter_access_secret = 'XXXX' 

twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret) 

statuses = twitter_api.GetUserTimeline(screen_name=handle, count=200, include_rts=False) 

for status in statuses: 
    if (status.lang == 'en'): 
    print status 
+0

的可能的複製[Twitter的API - ?具有一定的主題標籤顯示所有微博](http://stackoverflow.com/questions/2714471/twitter-api-display-all- tweet with-a-certain-hashtag) –

+0

[它似乎不可能](https://twittercommunity.com/t/get-user-timeline-tag-filtering/17508)通過hashtag與[ GetUserTimeline](https://dev.twitter.com/rest/reference/get/statuses/user_timeline)函數。根據Xander的建議,[GetSearch](https://pythonism.wordpress.com/2013/10/12/using-the-twitter-api-with-python-twitter/)方法可能會有所幫助。否則,您可以一次下載200批次的推文,並自行過濾(我認爲Twitter限制您下載用戶的最後3200條推文)。 – Boa

+0

至於存儲在數據庫中,除非您在提供數據庫抽象層(即Django,web2py等)的某個框架內工作,請查閱http://www.sqlalchemy.org/。 – Boa

回答

0

不熟悉twitter包,但這可能是一個建議,你可以繼續工作。取決於你想如何保存推文,你可以用你想要的方式替換「打印狀態」。 但是,這隻允許您過濾200條推文,而不是獲取包含特定哈希標籤的200條推文。

import sys 
import operator 
import requests 
import json 
import twitter 

twitter_consumer_key = 'XXXX' 
twitter_consumer_secret = 'XXXX' 
twitter_access_token = 'XXXX' 
twitter_access_secret = 'XXXX' 

twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret) 

statuses = twitter_api.GetUserTimeline(screen_name=handle, count=200, include_rts=False) 

tag_list = ["Xmas", "Summer"] 
for status in statuses: 
    if (status.lang == 'en'): 
    #assume there exists a hashtag in the tweet 
    for hashtag in status.entities.hashtags: 
     if hashtag.text in tag_list: 
     print status 
+0

感謝您的建議,但我真的需要掃描所有用戶的標籤(而不是篩選單個用戶的推文)。我找不到任何有關我迄今爲止使用的「推特」庫的任何文檔,所以我可能會切換到其他更有用的方法。 –

+0

@MattBrown啊,你只是想要一個簡單的搜索功能。剛剛在Twitter官方網站上發現:「Twitter搜索API搜索最近7天發佈的最近推文的樣本。」如果您想匹配完整性,則可以考慮使用Streaming API。 – Young

0

我附上一個Java代碼,將打印出過去100個鳴叫包括「#engineeringproblems」#標籤(來自任何用戶)。您需要在庫中添加twitter API'twitter4J'。

API下載鏈路http://twitter4j.org/en/index.html#download

Java源代碼:

public static void main(String[] args) { 

    ConfigurationBuilder cb = new ConfigurationBuilder(); 
    cb.setDebugEnabled(true) 
    .setOAuthConsumerKey("xxxx") 
    .setOAuthConsumerSecret("xxxx") 
    .setOAuthAccessToken("xxxx") 
    .setOAuthAccessTokenSecret("xxxx"); 

    Twitter twitter = new TwitterFactory(cb.build()).getInstance(); 
    Query query = new Query("#engineeringproblems "); 
    int numberOfTweets = 100; 
    long lastID = Long.MAX_VALUE; 
    ArrayList<Status> tweets = new ArrayList<Status>(); 

    while (tweets.size() < numberOfTweets) { 
     if (numberOfTweets - tweets.size() > 100) { 
      query.setCount(100); 
     } else { 
      query.setCount(numberOfTweets - tweets.size()); 
     } 
     try { 
      QueryResult result = twitter.search(query); 
      tweets.addAll(result.getTweets()); 
      System.out.println("Gathered " + tweets.size() + " tweets" + "\n"); 
      for (Status t : tweets) { 
       if (t.getId() < lastID) { 
        lastID = t.getId(); 
       } 
      } 

     } catch (TwitterException te) { 
      System.out.println("Couldn't connect: " + te); 
     }; 
     query.setMaxId(lastID - 1); 
    } 
    for (int i = 0; i < tweets.size(); i++) { 
     Status t = (Status) tweets.get(i); 


     String user = t.getUser().getScreenName(); 
     String msg = t.getText(); 

     System.out.println(i + " USER: " + user + " wrote: " + msg + "\n"); 
    } 
} 
0

很抱歉,但我真的一直在尋找一個Python的解決方案,我相信我終於找到它,併成功地進行了測試。代碼如下。仍然在尋找一種方法來修改腳本以將每行輸入到SQL數據庫中,但我希望我可以在其他地方找到它。

PIP安裝TwitterSearch

from TwitterSearch import * 
try: 
    tso = TwitterSearchOrder() # create a TwitterSearchOrder object 
    tso.set_keywords(['Guttenberg', 'Doktorarbeit']) # let's define all words we would like to have a look for 
    tso.set_language('de') # we want to see German tweets only 
    tso.set_include_entities(False) # and don't give us all those entity information 

    # it's about time to create a TwitterSearch object with our secret tokens 
    ts = TwitterSearch(
     consumer_key = 'aaabbb', 
     consumer_secret = 'cccddd', 
     access_token = '111222', 
     access_token_secret = '333444' 
    ) 

    # this is where the fun actually starts :) 
    for tweet in ts.search_tweets_iterable(tso): 
     print('@%s tweeted: %s' % (tweet['user']['screen_name'], tweet['text'])) 

except TwitterSearchException as e: # take care of all those ugly errors if there are some 
    print(e) 
相關問題