我正在嘗試使用Java進行簡單的Spark SQL編程。在程序中,我從Cassandra表中獲取數據,將RDD
轉換爲Dataset
並顯示數據。當我運行命令時,出現錯誤:java.lang.ClassNotFoundException: org.apache.spark.internal.Logging
。使用Spark SQL時未找到Spark Sparkging類
我的計劃是:
SparkConf sparkConf = new SparkConf().setAppName("DataFrameTest")
.set("spark.cassandra.connection.host", "abc")
.set("spark.cassandra.auth.username", "def")
.set("spark.cassandra.auth.password", "ghi");
SparkContext sparkContext = new SparkContext(sparkConf);
JavaRDD<EventLog> logsRDD = javaFunctions(sparkContext).cassandraTable("test", "log",
mapRowTo(Log.class));
SparkSession sparkSession = SparkSession.builder().appName("Java Spark SQL").getOrCreate();
Dataset<Row> logsDF = sparkSession.createDataFrame(logsRDD, Log.class);
logsDF.show();
我的POM依賴關係:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>1.6.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.2</version>
</dependency>
</dependencies>
我命令是:/home/ubuntu/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --class "com.jtv.spark.dataframes.App" --master local[4] spark.dataframes-0.1-jar-with-dependencies.jar
我怎麼解決這個問題?降級爲1.5.2
不起作用1.5.2
沒有org.apache.spark.sql.Dataset
和org.apache.spark.sql.SparkSession
。
@T.Gawęda那裏的解決方案對我不起作用,因爲降級到1.5.2,因爲1.5.2沒有'org.apache.spark.sql.Dataset'和'org.apache.spark.sql.SparkSession '。 – khateeb
請檢查連接器版本2.0 - 請參閱https://github.com/datastax/spark-cassandra-connector –
@T.GawędaConnector 2.0仍處於測試階段。我用它,我得到這個錯誤:'NullPointerException異常 \t在org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465)NullPointerException異常 \t在org.spark_project.guava.reflect.TypeToken.method(TypeToken。的java:465) 在org.apache.spark.sql.SparkSession.getSchema(SparkSession.scala:673) \t在org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:340) \t在有機apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:359) \t at com.jtv.spark.dataframes.App.main(App.java:25)' – khateeb