的N-gram分析在Python

這裏是我的樣本數據是這樣的：的N-gram分析在Python

我需要進行1-2克上查詢，並計算與查詢相關的總和與印象的平均。現在我已經想出瞭如何使用下面的代碼來彙總展示次數。

def n_grams(txt): 
grams = list() 
words = txt.split(' ') 
for i in range(len(words)): 
    for k in range(1, len(words) - i + 1): 
     grams.append(" ".join(words[i:i+k])) 
return pd.Series(grams) 


counts = df['query'].apply(n_grams).join(df) 
result = counts.drop("query", axis=1).set_index("impression").unstack() .rename("ngram").dropna().reset_index() .drop("level_0", 
axis=1).groupby("ngram")["impression"].sum() 
result = result.to_frame() 
result['query'] = result.index 
result['ngram'] =result['query'].str.split().apply(len) 
result = result.groupby(['ngram','query'])['impression'].sum() 
result = result.reset_index() 
result = result.sort_values(['ngram', 'impression'], ascending=[True, False])

返回的結果一樣：

在這裏，我需要一個又一個欄，顯示與這些查詢相關的平均印象。例如，「營養」一詞出現四次，所以平均印象應該是100/4 = 25.另外，我想顯示此查詢在另一列中出現的次數。最終結果應該如下所示：

來源

2017-06-07 Ran Tao

此代碼將幫助您計算來自bigrams的unigrams的數量，如'營養'。

2gram=result[result['ngram']==2] 
2gram=2gram.reset_index() 
#create an empty dictionary to store count of words in bigrams 
words=dict() 
for i in range(0,len(2gram): 
    query_wrds=2gram.loc[i,'query'].split() 
     for item in query_words: 
      if item not in words: 
       words[item]=1 
      else: 
       words[item]+=1 
#to get count of word 'nutrition' 
nut_ct=words['nutrition']

來源

2017-06-07 20:37:21

的N-gram分析在Python

回答

相關問題