2017-04-26 50 views
1

我需要一些幫助在Python中創建for循環。我是一個完整的編碼新手。請指向正確的方向。如何在Python中爲LDA模型創建循環

這是我迄今爲止所做的。我已經使用Twitter API發佈了1000條關於主題的推文。然後,我使用lda模型來查找前3個主題。

現在我需要通過下面的代碼遍歷文檔(推文),其中x等於文檔編號(0到999),以獲取每個文檔的主題分佈。 ldamodel.get_document_topics(corpus [x]) 有人能指出我如何制定我的循環正確的方向嗎?

這裏是我的猜測,到目前爲止:

鳴叫使用此代碼(未完成)被拉扯:

def get_tweets(input_query): 
    consumer_key = "x" 
    consumer_secret = "x" 
    access_token = "x" 
    access_token_secret = "x" 
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
    auth.set_access_token(access_token, access_token_secret) 
    api = tweepy.API(auth) 
    return tweepy.Cursor(api.search, q=input_query, lang="en").items() 

input_queries = ['Tornado'] 
tweets = {} 
dataset = defaultdict(list) 
for input_query in input_queries: 
    tweets = get_tweets(input_query) 
    download_tweet_count = 1000 
    print(input_query) 
    counter = 0 
    .... 

    .... 
ldamodel = models.ldamodel.LdaModel(corpus, num_topics=3, id2word = 
dictionary, passes=20) 

counter = 0 
for x in download_tweet_count: 
while counter < x: 
    try: 
     ldamodel.get_document_topics(corpus[x]) 

我需要在每個文檔(鳴叫)與ldamodel.get_document_topics(語料庫運行模式[x]),然後將該推文分配給具有最高概率主題匹配的主題。我相信我可以使用數據框或單獨的列表來存儲分配。我不知道「數據框」是什麼意思。

+0

看看文檔,嘗試創建一個循環並返回代碼。我們很樂意提供幫助。鏈接到文檔:https://docs.python.org/3/tutorial/controlflow.html – lordingtar

回答

0

這是我的代碼的片段,我通常如何創建矩陣來執行LDA。

# loop through the feature and construct the feature array 
features_size = len(features.items()) 

X = [] #np.ndarray. This is what we are going to put in LDA module. 

for i in TweetFeatures.items(): # TweetFeatures is words that appeared in your tweet 
    current_vector = np.array([0]*features_size) 
    for j in i[1]: # TweetFeatures key is your tweet ID and value is array of words. (This depends on how you define them) 
     if j in map_id_2_index: 
      current_vector[map_id_2_index[j]] = 1 
    X.append(current_vector) 
X=np.array(X) # document-term matrix 
X=X[~np.all(X == 0, axis=1)] # remove all zero line 
print("type(X): {}".format(type(X))) 
print("shape: {}\n".format(X.shape)) 

#################################### 
#### LDA MODELLING ################# 
#################################### 
model = lda.LDA(n_topics=5, n_iter=1000, random_state=1) 
model.fit(X)