columnSimilarities（）RowMatrix返回錯誤架構：初始化數據庫失敗

在spark 2.2.0下，我遇到了使用columnSimilarities（）的錯誤。columnSimilarities（）RowMatrix返回錯誤架構：初始化數據庫失敗

這裏是重現的代碼。

from pyspark.mllib.linalg.distributed import RowMatrix 
rdd = sc.parallelize([[1.0,2.0,1.0],[1.0,5.0,1.0],[1.0,2.0,1.0],[4.0,2.0,4.0]]) 
mat = RowMatrix(rdd) 
sim = mat.columnSimilarities(0.1) 
sim.entries.collect()

錯誤就是這樣（分叉，太長，完整日誌是here）。

17/08/13 10:15:19 ERROR Schema: Failed initialising database. 
Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------ 
java.sql.SQLException: Failed to start database 'metastore_db' with class loader [email protected]34df5e, see the next exception for details. 
    at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) 
    at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) 
    at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source) 
    at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)

此代碼效果很好。

from pyspark.mllib.linalg.distributed import IndexedRow, IndexedRowMatrix 
rdd = sc.parallelize([IndexedRow(0, [1.0,2.0,1.0]), 
         IndexedRow(1, [1.0,5.0,1.0]), 
         IndexedRow(2, [1.0,2.0,1.0]), 
         IndexedRow(3, [4.0,2.0,4.0])]) 
mat = IndexedRowMatrix(rdd).toRowMatrix() 
sim = mat.columnSimilarities(0.1) 
sim.entries.collect()

這是Spark的bug嗎？

來源

2017-08-13 M.Kiuchi

這是一個jdbc連接問題 - 而不是關於columnSimilarities-或MLlib的一般問題。

您可能需要做一些工作才能使derby連接正常運行。這裏有一個出發點：https://stackoverflow.com/a/40547664/1056563

來源

2017-08-14 02:23:41 javadba

columnSimilarities（）RowMatrix返回錯誤架構：初始化數據庫失敗

回答

相關問題