我已經創建了一個程序,可以在我的cassandra表中查詢數據並查詢twitter API以獲取關注者和一個用戶的朋友。我安全地保存了所有的id,然後當我把所有的追隨者/朋友寫入Cassandra。在一個查詢中在cassandra中寫入大量數據
問題是其中一個用戶得到1M24追隨者,當我執行此代碼的大小設置種類生成寫入cassandra錯誤。
def get_data(tweepy_function, author_id, author_username, session):
if tweepy_function == "followers":
followers = set()
for follower_id in tweepy.Cursor(API.followers_ids, id=author_id, count=5000).items():
if len(followers) % 5000 == 0 and len(followers) != 0:
print("Collected followers: ", len(followers))
followers.add(follower_id)
query = "INSERT INTO {0} (node_id, screen_name, centrality, follower_ids) VALUES ({1}, {2}, {3}, {4})"\
.format("network", author_id, author_username, 0.0, followers)
session.execute(query)
if tweepy_function == "friends":
friends = set()
for friend_id in tweepy.Cursor(API.friends_ids, id=author_id, count=5000).items():
if len(friends) % 5000 == 0 and len(friends) != 0:
print("Collected followers: ", len(friends))
friends.add(friend_id)
query = "INSERT INTO {0} (node_id, screen_name, centrality, friend_ids) VALUES ({1}, {2}, {3}, {4})"\
.format("network", author_id, author_username, 0.0, friends)
session.execute(query)
至於問我加我的架構:
table = """CREATE TABLE IF NOT EXISTS
{0} (
node_id bigint ,
screen_name text,
last_tweets set<text>,
follower_ids set<bigint>,
friend_ids set<bigint>,
centrality float,
PRIMARY KEY (node_id))
""".format(table_name)
爲什麼我得到一個寫入錯誤?如何預防它?這是將數據安全轉入Cassandra的好方法嗎?
你的模式是什麼? –
@AshrafulIslam添加它 – mel