2015-11-19 75 views
2

我們在Azure Spark上使用ALS來構建我們的推薦系統。Spark ALS隱含異常

對於我們無法爲每個單獨用戶輸出不同推薦列表的計算能力原因。因此,我們將用戶劃分爲羣集,並使用ALS輸出每個羣集質心的推薦列表。

在對用戶進行聚類之前,我們在Spark上使用standardcaler和normalizer來預處理數據,以獲得更好的聚類結果。然而,使用ALS.trainImplicit

15/11/16 15:43:11 INFO TaskSetManager: Lost task 30.0 in stage 15.0 (TID 197) on executor localhost: java.lang.AssertionError (assertion failed: lapack.dppsv returned 4.) [duplicate 9] Traceback (most recent call last): File "/home/rogeesjir_huasqngfda/woradofkapkspace/jigsusLaudfadfecher/scripts/RecommendationBackend/AzureSpark/src/collaborativeFiltering/spark_als.py", line 92, in main() File "/home/rogeesjir_huasqngfda/rogeesjir_huasqngfda/jigsusLaudfadfecher/scripts/RecommendationBackend/AzureSpark/src/collaborativeFiltering/spark_als.py", line 39, in main model = ALS.trainImplicit(ratings, rank, numIter, alpha=0.01) File "/home/jigsusLaudfadfecher/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/recommendation.py", line 147, in trainImplicit 15/11/16 15:43:11 INFO TaskSetManager: Lost task 25.0 in stage 15.0 (TID 192) on executor localhost: java.lang.AssertionError (assertion failed: lapack.dppsv returned 4.) [duplicate 10] iterations, lambda_, blocks, alpha, nonnegative, seed) File "/home/jigsusLaudfadfecher/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/common.py", line 120, in callMLlibFunc return callJavaFunc(sc, api, *args) File "/home/jigsusLaudfadfecher/spark-1.3.1-bin-hadoop2.6/python/pyspark/mllib/common.py", line 113, in callJavaFunc return _java2py(sc, func(*args)) File "/home/jigsusLaudfadfecher/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in call File "/home/jigsusLaudfadfecher/spark-1.3.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError15/11/16 15:43:11 INFO TaskSetManager: Lost task 16.0 in stage 15.0 (TID 183) on executor localhost: java.lang.AssertionError (assertion failed: lapack.dppsv returned 4.) [duplicate 11]

: An error occurred while calling o39.trainImplicitALSModel. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 15.0 failed 1 times, most recent failure: Lost task 8.0 in stage 15.0 (TID 175, localhost): java.lang.AssertionError: assertion failed: lapack.dppsv returned 4. at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.ml.recommendation.ALS$CholeskySolver.solve(ALS.scala:355) at org.apache.spark.ml.recommendation.ALS$$anonfun$org$apache$spark$ml$recommendation$ALS$$computeFactors$1.apply(ALS.scala:1131) at org.apache.spark.ml.recommendation.ALS$$anonfun$org$apache$spark$ml$recommendation$ALS$$computeFactors$1.apply(ALS.scala:1092) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$15.apply(PairRDDFunctions.scala:674) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$15.apply(PairRDDFunctions.scala:674) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:172) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:79) at org.apache.spark.rdd.RDD.iterator(RDD.scala:242) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

時,這將導致以下異常當我們刪除了「轉正」組件(即,不要做standardscaler和正規化),一切工作正常。 順便說一下,即使我們在ALS模型訓練之前調整數據,ALS.train()呼叫顯式評分也能正常工作。

有沒有人遇到過這樣的問題? 我們仍然是新的,所以請幫助!謝謝。

+0

你能提供你的代碼和一個樣本數據集嗎?該錯誤可能會讀取,就好像有代碼在不應該的工作人員上運行。 Spark羣集上的配置細節也會有所幫助。 –

回答

2

,我收到了類似的錯誤與ALS.train()

java.lang.AssertionError: assertion failed: lapack.dpotrs returned 6. 
... 

谷歌把我帶到報道星火JIRA https://issues.apache.org/jira/browse/SPARK-11918其中與加權最小二乘(WLS)做線性迴歸時,會出現同樣的錯誤的問題。很顯然,當你試圖解決一個線性方程的病態系統時,LAPACK會拋出這個錯誤。

在我的情況下,從默認值0.01增加ALS.train()參數lambda_較高值>= 0.02似乎幫助,雖然我不知道這是否會永久解決我的問題...

3

對於未來的讀者:

Several columns in the given dataset contain only zeros. In this case, the data matrix is no full rank. Therefore the Gramian matrix is singular and hence not invertible. The Cholesky decomposition will fail in this case. This will also happen if standard deviation of more than one columns is zero (even if the values are not zero). I think we should catch this error in the code and exit with a warning message. OR we can drop columns with zero variance, and continue with the algorithm.

摘自comment

只要確保大多數的評級是非零的,它會工作。

相關問題