2014-11-02 99 views
0

我正在使用Python代碼使用Tweepy庫來檢索特定主題標籤的Twitter數據,但問題是我需要檢索特定時間段,例如2013年6月30日至2013年12月30日。我怎樣才能做到這一點?使用Tweepy檢索Twitter數據

#imports 
from tweepy import Stream 
from tweepy import OAuthHandler 
from tweepy.streaming import StreamListener 

#setting up the keys 
consumer_key = '……………….' 
consumer_secret = '……………..' 
access_token = '……………….' 
access_secret = '……………..' 

class TweetListener(StreamListener): 
# A listener handles tweets are the received from the stream. 
#This is a basic listener that just prints received tweets to standard output 

    def on_data(self, data): 
    print (data) 
    return True 

    def on_error(self, status): 
    print (status) 



#printing all the tweets to the standard output 
auth = OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_secret) 



stream = Stream(auth, TweetListener()) 

t = u"#سوريا" 
stream.filter(track=[t]) 
+1

您無法獲取該數據;見例如http://stackoverflow.com/a/1733360/3001761 – jonrsharpe 2014-11-02 16:07:44

+0

但我連續運行兩天的代碼,檢索數據。所有這些元數據只有三個星期? – Hana 2014-11-02 16:29:59

+0

@Hana你能解決這個問題嗎? – user3378649 2014-11-02 23:32:05

回答

3

我仍在調查爲什麼我不能得到使用tweepy.Cursor(api.search, geocode=.., q=query, until=date)相同的結果也許是這個reason。但是我可以在兩個日期之間使用Tweepy檢索Twitter數據。

首先,我在開始日期和結束日期之間創建了一個日期生成器。

def date_range(start,end): 
    current = start 
    while (end - current).days >= 0: 
     yield current 
     current = current + datetime.timedelta(seconds=1) #Based on your need, but you could do it per day/minute/hour 

然後,我創建了一個Listener,所以我可以說是在特定的一天通過訪問status.created_at

創建你的代碼應該看起來像鳴叫:

import tweepy 
from tweepy import Stream 
from tweepy import OAuthHandler 
from tweepy.streaming import StreamListener 
import json 
import datetime 


#Use your keys 
consumer_key = '...' 
consumer_secret = '...' 
access_token = '...' 
access_secret = '...' 


auth = OAuthHandler(consumer_key, consumer_secret) 
auth.set_access_token(access_token, access_secret) 

def date_range(start,end): 
    current = start 
    while (end - current).days >= 0: 
     yield current 
     current = current + datetime.timedelta(seconds=1) 

class TweetListener(StreamListener): 
    def on_status(self, status): 
     #api = tweepy.API(auth_handler=auth) 
     #status.created_at += timedelta(hours=900) 

     startDate = datetime.datetime(2013, 06, 30) 
     stopDate = datetime.datetime(2013, 10, 30) 
     for date in date_range(startDate,stopDate): 
      status.created_at = date 
      print "tweet " + str(status.created_at) +"\n" 
      print status.text + "\n" 
      # You can dump your tweets into Json File, or load it to your database 

stream = Stream(auth, TweetListener(), secure=True,) 
t = u"#Syria" # You can use different hashtags 
stream.filter(track=[t]) 

輸出:

我只是打印日期來檢查(我不希望垃圾郵件與政治tweet的StackOverflow)。

tweet 2013-06-30 00:00:01 

------------------- 

tweet 2013-06-30 00:00:02 

------------------- 

tweet 2013-06-30 00:00:03 

------------------- 

tweet 2013-06-30 00:00:04 

------------------- 

tweet 2013-06-30 00:00:05 

------------------- 

tweet 2013-06-30 00:00:06 

------------------- 

tweet 2013-06-30 00:00:07 

------------------- 

tweet 2013-06-30 00:00:08 

------------------- 

tweet 2013-06-30 00:00:09 

------------------- 
+0

謝謝Taha,我會在系統完成檢索數據後嘗試該代碼。 – Hana 2014-11-02 18:51:03

+0

當然,非常感謝。 – Hana 2014-11-02 19:38:59

+0

我已經試過你的代碼,它的工作原理,但我只有推文和推文時間沒有用戶ID! – Hana 2014-11-19 00:02:32