預測評估失敗，文本分類模板

我試圖預測基於predictionio上其他文本字段的文本字段。我用this指南作參考。我創建使用預測評估失敗，文本分類模板

pio app new MyTextApp

一個新的應用和使用模板中提供的數據源遵循的指導高達評價。這一切都沒問題，直到評估。在評估數據源時，我在下面粘貼錯誤。

[INFO] [CoreWorkflow$] runEvaluation started 
[WARN] [Utils] Your hostname, my-ThinkCentre-Edge72 resolves to a loopback address: 127.0.0.1; using 192.168.65.27 instead (on interface eth0) 
[WARN] [Utils] Set SPARK_LOCAL_IP if you need to bind to another address 
[INFO] [Remoting] Starting remoting 
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDr[email protected]:59649] 
[INFO] [CoreWorkflow$] Starting evaluation instance ID: AU29p8j3Fkwdnkfum_ke 
[INFO] [Engine$] DataSource: [email protected] 
[INFO] [Engine$] Preparator: [email protected] 
[INFO] [Engine$] AlgorithmList: List([email protected]) 
[INFO] [Engine$] Serving: [email protected] 
Exception in thread "main" java.lang.UnsupportedOperationException: empty.maxBy 
at scala.collection.TraversableOnce$class.maxBy(TraversableOnce.scala:223) 
at scala.collection.AbstractTraversable.maxBy(Traversable.scala:105) 
at org.template.textclassification.PreparedData.<init>(Preparator.scala:152) 
at org.template.textclassification.Preparator.prepare(Preparator.scala:38) 
at org.template.textclassification.Preparator.prepare(Preparator.scala:34)

我必須編輯任何配置文件才能使其工作嗎？我已經成功地對movielens數據進行了測試。

來源

2015-06-04 cutteeth

因此，當通過DataSource類未正確讀取數據時，會出現此特定錯誤消息。如果您使用的是不同的文本數據集，請確保您正確反映了readEventData方法中eventNames，entityType和各自屬性字段名稱的任何更改。

maxBy方法用於提取具有最多觀察值的類。如果標籤Map的類別爲空，則意味着沒有類別被記錄，這基本上告訴您沒有數據被輸入。

例如，我剛剛使用此引擎做了垃圾郵件檢測器。我的電子郵件數據的形式爲：

{"entityType": "content", "eventTime": "2015-06-04T00:22:39.064+0000", "entityId": 1, "event": "e-mail", "properties": {"label": "spam", "text": "content"}}

要使用的引擎這個數據我做的DataSource類以下變化：

entityType = Some("source"), // specify data entity type eventNames = Some(List("documents")) // specify data event name

變化

entityType = Some("content"), // specify data entity type eventNames = Some(List("e-mail")) // specify data event name

and

個

)(sc).map(e => Observation(
    e.properties.get[Double]("label"), 
    e.properties.get[String]("text"), 
    e.properties.get[String]("category") 
)).cache

變化：

)(sc).map(e => { 
    val label = e.properties.get[String]("label") 


    Observation(
    if (label == "spam") 1.0 else 0.0, 
    e.properties.get[String]("text"), 
    label 
) 
}).cache

在此之後，我能夠經過建設，培訓和部署，以及評估。

來源

2015-06-04 17:29:29

感謝您的信息。我爲不同的數據集使用了相同的應用程序。我刪除了現有的應用程序，數據並創建了新的應用程序，然後運行pio構建，培訓和部署。現在它工作正常。 :) – cutteeth

真棒，我很高興的迴應幫助！我剛剛發佈了一個新版本的引擎，其中包含一個完整性檢查，以確保訓練數據實際上被饋入。PreparedClass也被修改，以便文本向量化處理更快。 –

我已經下載了最新的文本分類模板（2.0），同樣的問題也在最近的更新中。評估失敗，錯誤爲'java.lang.UnsupportedOperationException：empty.maxBy'，並且訓練失敗，發生'io.prediction.data.storage.DataMapException：字段標籤是必需的。'pio說spark地址綁定到loopback。我必須將其更改爲公共IP嗎？你也可以請解釋文本矢量化？ – cutteeth

預測評估失敗，文本分類模板

回答

相關問題