1
我有一個python腳本,它維護與Twitter Streaming API的開放連接,並將數據寫入json文件。在寫入當前文件達到一定大小之後,是否可以在不中斷連接的情況下寫入新文件?例如,我只傳輸了超過1周的數據,但所有數據都包含在一個文件中(〜2GB),因此解析速度很慢。如果我可以在500mb之後寫入新文件,那麼我將有4個較小的文件(例如dump1.json,dump2.json等)來解析,而不是一個較大的文件。Python API Streaming,在特定大小後寫入新文件
import tweepy
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener
# Add consumer/access tokens for Twitter API
consumer_key = '-----'
consumer_secret = '-----'
access_token = '-----'
access_secret = '-----'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
# Define streamlistener class to open a connection to Twitter and begin consuming data
class MyListener(StreamListener):
def on_data(self, data):
try:
with open('G:\xxxx\Raw_tweets.json', 'a') as f:
f.write(data)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
def on_error(self, status):
print(status)
return True
bounding_box = [-77.2157,38.2036,-76.5215,39.3365]#filtering by location
keyword_list = ['']#filtering by keyword
twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(locations=bounding_box) # Filter Tweets in stream by location bounding box
#twitter_stream.filter(track=keyword_list) # Filter Tweets in stream by keyword
是'os.stat(tweet_file).st_size> 2 ** 10'如何設置文件大小? –
@AndrewR,這是你如何檢查文件大小。我首先檢查存在以避免異常 - 您可以使用_try ... except_。您可以將此代碼打包到_getter_方法中 - 無論您的風格如何 – volcano