2017-04-14 23 views
0

我正在使用這個IGN數據集來尋找關於遊戲的正面和負面反饋,基本上有兩列,sentiment其中包含0或1(壞或好)和title這可能是'失落的世界'無法用tflearn預測新的句子結果

主要是我用tflearn來做硬核心的東西。一切都很順利,唯一的問題就是預測新的例子。

#lets load the ign dataset 
dataframe = pd.read_csv('ign.csv') 

# Convert score_phrase to binary sentiments and add a new column called sentiment 
bad_phrases = ['Bad', 'Awful', 'Painful', 'Unbearable', 'Disaster'] 
dataframe['sentiment'] = dataframe.score_phrase.isin(bad_phrases).map({True: 0, False: 1}) 

# lets remove everything besides title and score_phrase 
dataframe = dataframe.drop(["score_phrase","Unnamed: 0","url","platform", "score", "genre", "editors_choice", "release_year", "release_month","release_day"], axis=1) 

#lets fill in any empty space with random spaces 
dataframe.fillna(value='', inplace=True) 

#preprocessing 
word_processor = VocabularyProcessor(100) 
#converting all the title as input featurres X 
trainX = np.array(list(word_processor.fit_transform(dataframe["title"]))) 
#converting the score_pharse to trainY since its counted as the label 
trainY = dataframe.loc[:, ["sentiment"]].as_matrix() 

# Network building 
def build_model(): 
    # This resets all parameters and variables, 
    tf.reset_default_graph() 
    net = tflearn.input_data([None, 100])  # Input 

    net = tflearn.fully_connected(net, 200, activation='ReLU')  # Hidden 
    net = tflearn.fully_connected(net, 200, activation='ReLU') 

    net = tflearn.fully_connected(net, 1, activation='softmax') # Output 
    net = tflearn.regression(net, optimizer='sgd', learning_rate=0.01, loss='categorical_crossentropy') 

    model = tflearn.DNN(net) 
    return model 
model = build_model() 
# Training 
model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=128, n_epoch=10) 

在這裏,一切都變得地獄,而預測。

example = 'Little Big Planet' 
text = np.array(word_processor.fit(example)) 
pred_class = np.argmax(model.predict([text])) 

我得到的錯誤是

--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-59-14b2aecc0f49> in <module>() 
     1 example = 'Little Big Planet' 
     2 text = np.array(word_processor.fit(example)) 
----> 3 pred_class = np.argmax(model.predict([text])) 
     4 pred_class 

/anaconda/envs/MLHardCore/lib/python3.5/site-packages/tflearn/models/dnn.py in predict(self, X) 
    229   """ 
    230   feed_dict = feed_dict_builder(X, None, self.inputs, None) 
--> 231   return self.predictor.predict(feed_dict) 
    232 
    233  def predict_label(self, X): 

/anaconda/envs/MLHardCore/lib/python3.5/site-packages/tflearn/helpers/evaluator.py in predict(self, feed_dict) 
    67    prediction = [] 
    68    for output in self.tensors: 
---> 69     o_pred = self.session.run(output, feed_dict=feed_dict).tolist() 
    70     for i, val in enumerate(o_pred): # Reshape pred per sample 
    71      if len(self.tensors) > 1: 

/anaconda/envs/MLHardCore/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata) 
    765  try: 
    766  result = self._run(None, fetches, feed_dict, options_ptr, 
--> 767       run_metadata_ptr) 
    768  if run_metadata: 
    769   proto_data = tf_session.TF_GetBuffer(run_metadata_ptr) 

/anaconda/envs/MLHardCore/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata) 
    936     ' to a larger type (e.g. int64).') 
    937 
--> 938   np_val = np.asarray(subfeed_val, dtype=subfeed_dtype) 
    939 
    940   if not subfeed_t.get_shape().is_compatible_with(np_val.shape): 

/anaconda/envs/MLHardCore/lib/python3.5/site-packages/numpy/core/numeric.py in asarray(a, dtype, order) 
    529 
    530  """ 
--> 531  return array(a, dtype, copy=False, order=order) 
    532 
    533 

ValueError: setting an array element with a sequence. 

回答

0

我不知道這是否會在所有幫助,但我能得到一個預測。

這些都是我的進口(從上面的代碼丟失):

import pandas as pd 
from tflearn.data_utils import VocabularyProcessor 
import numpy as np 
import tensorflow as tf 
import tflearn 

運行model.predict([text])再現您的錯誤。

潛在的解決方案:

example = 'Little Big Planet' 
text = list(word_processor.transform([example]))[0].reshape(1, 100) 
model.predict(text) 

輸出:

[[1.0]] 

編輯:

要查看字的映射:

>>> vocab_dict = word_processor.vocabulary_._mapping 
>>> vocab_dict['Little'] 
1 
>>> vocab_dict['Big'] 
2 
>>> vocab_dict['Planet'] 
3 
>>> vocab_dict['World'] 
75 

要看看這些數字是有意義的,我們檢查由:

>>> example = 'Little Big Planet' 
>>> text = list(word_processor.transform([example]))[0].reshape(1, 100) 
>>> text 
array([[1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int64) 
>>> example = 'Little Big World' 
>>> text = list(word_processor.transform([example]))[0].reshape(1, 100) 
>>> text 
array([[ 1, 2, 75, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int64) 
+0

Jarad,這肯定工作,但所有的輸出爲[1.0] :(我改變了遊戲的名稱,他們仍然沒有改變:( 如果你想我可以給你發IGN鏈接:D –

+0

我找到了IGN鏈接到數據。我編輯了我的答案並檢查了'word_processor'對象,並且我們預測的'text'數組似乎按預期工作。我不確定爲什麼所有的輸出都是1.0。 – Jarad

+0

我的意思是有正面和負面的評論,一些輸出也應該不同。 –