1
我試圖讀取壓縮的csv文件(.bz2)作爲DataFrame。我的代碼如下Spark 2.1.0:讀取壓縮的csv文件
// read the data
Dataset<Row> rData = spark.read().option("header", true).csv(input);
這在我在IDE中嘗試時有效。我可以讀取數據並進行處理,但是當我嘗試使用Maven來構建它,並在命令行中,我得到以下錯誤
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: csv. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:569)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:325)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:415)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:352)
at com.cs6240.Driver.main(Driver.java:28)
Caused by: java.lang.ClassNotFoundException: csv.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:554)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:554)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:554)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:554)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:554)
... 7 more
我不知道如果我失去了一些東西運行。讀取CSV文件是否有一些依賴性?根據文檔,Spark 2.x.x支持這個功能。