4

鯊魚/火花wiki的開發部分非常簡短,所以我嘗試將代碼放在一起以編程方式查詢表格。這是...鯊魚/火花引擎在查詢表格時拋出NPE

object Test extends App { 
    val master = "spark://localhost.localdomain:8084" 
    val jobName = "scratch" 

    val sparkHome = "/home/shengc/Downloads/software/spark-0.6.1" 
    val executorEnvVars = Map[String, String](
    "SPARK_MEM" -> "1g", 
    "SPARK_CLASSPATH" -> "", 
    "HADOOP_HOME" -> "/home/shengc/Downloads/software/hadoop-0.20.205.0", 
    "JAVA_HOME" -> "/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64", 
    "HIVE_HOME" -> "/home/shengc/Downloads/software/hive-0.9.0-bin" 
) 

    val sc = new shark.SharkContext(master, jobName, sparkHome, Nil, executorEnvVars) 
    sc.sql2console("create table src") 
    sc.sql2console("load data local inpath '/home/shengc/Downloads/software/hive-0.9.0-bin/examples/files/kv1.txt' into table src") 
    sc.sql2console("select count(1) from src") 
} 

我可以創建表src和數據裝載到SRC很好,但最後的查詢扔NPE和失敗的,這裏是輸出...

13/01/06 17:33:20 INFO execution.SparkTask: Executing shark.execution.SparkTask 
13/01/06 17:33:20 INFO shark.SharkEnv: Initializing SharkEnv 
13/01/06 17:33:20 INFO execution.SparkTask: Adding jar file:///home/shengc/workspace/shark/hive/lib/hive-builtins-0.9.0.jar 
java.lang.NullPointerException 
    at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:58) 
    at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:55) 
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34) 
    at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38) 
    at shark.execution.SparkTask.execute(SparkTask.scala:55) 
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134) 
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) 
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326) 
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118) 
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) 
    at shark.SharkContext.sql(SharkContext.scala:58) 
    at shark.SharkContext.sql2console(SharkContext.scala:84) 
    at Test$delayedInit$body.apply(Test.scala:20) 
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34) 
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) 
    at scala.App$$anonfun$main$1.apply(App.scala:60) 
    at scala.App$$anonfun$main$1.apply(App.scala:60) 
    at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59) 
    at scala.collection.immutable.List.foreach(List.scala:76) 
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30) 
    at scala.App$class.main(App.scala:60) 
    at Test$.main(Test.scala:4) 
    at Test.main(Test.scala) 
FAILED: Execution Error, return code -101 from shark.execution.SparkTask13/01/06 17:33:20 ERROR ql.Driver: FAILED: Execution Error, return code -101 from shark.execution.SparkTask 
13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=Driver.execute start=1357511600030 end=1357511600054 duration=24> 
13/01/06 17:33:20 INFO ql.Driver: <PERFLOG method=releaseLocks> 
13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=releaseLocks start=1357511600054 end=1357511600054 duration=0> 

然而,我可以通過在bin/shark-withinfo調用的shell中輸入select * from src來查詢src表。

您可能會問我如何在由「bin/shark-shell」觸發的shell中嘗試sql。那麼,我無法進入那個殼。這是我碰到的錯誤...

https://groups.google.com/forum/?fromgroups=#!topic/shark-users/glZzrUfabGc

[編輯1]:這個NPE似乎是從SharkENV.sc導致尚未確定,所以我加了

shark.SharkEnv.sc = sc 

權在執行任何sql2console操作之前。然後它抱怨scala.tools.nsc的ClassNotFoundException,所以我手動將scala編譯器放在classpath中。在那之後,代碼抱怨了另一個ClassNotFoundException,我不知道如何解決它,因爲我確實把鯊魚jar放在了classpath中。

13/01/06 18:09:34 INFO cluster.TaskSetManager: Lost TID 1 (task 1.0:1) 
13/01/06 18:09:34 INFO cluster.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: shark.execution.TableScanOperator$$anonfun$preprocessRdd$3 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:321) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:266) 
    at java.lang.Class.forName0(Native Method) 
    at java.lang.Class.forName(Class.java:264) 

[編輯2]:OK,我想通了另一個代碼可以通過下面的如何初始化互動REPL正是鯊魚的源代碼,實現我想要的東西。

System.setProperty("MASTER", "spark://localhost.localdomain:8084") 
System.setProperty("SPARK_MEM", "1g") 
System.setProperty("SPARK_CLASSPATH", "") 
System.setProperty("HADOOP_HOME", "/home/shengc/Downloads/software/hadoop-0.20.205.0") 
System.setProperty("JAVA_HOME", "/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64") 
System.setProperty("HIVE_HOME", "/home/shengc/Downloads/software/hive-0.9.0-bin") 
System.setProperty("SCALA_HOME", "/home/shengc/Downloads/software/scala-2.9.2") 

shark.SharkEnv.initWithSharkContext("scratch") 
val sc = shark.SharkEnv.sc.asInstanceOf[shark.SharkContext] 

sc.sql2console("select * from src") 

這是醜陋的,但至少它的工作原理。如何編寫更強大的代碼的任何意見,歡迎!

對於任何希望以編程方式在鯊魚上操作的人,請注意,所有配置單元和鯊魚罐必須位於CLASSPATH中,並且scala編譯器也必須位於您的類路徑中。另一個重要的事情是hadoop的conf也應該在classpath中。

回答

1

我相信問題是你的SharkEnv沒有初始化。 我使用鯊魚0.9.0(但我相信你有0.6.1初始化SharkEnv太),我的SharkEnv按以下方式初始化:

// SharkContext 
val sc = new SharkContext(master, 
    jobName, 
    System.getenv("SPARK_HOME"), 
    Nil, 
    executorEnvVar) 

// Initialize SharkEnv 
SharkEnv.sc = sc 

// create and populate table 
sc.runSql("CREATE TABLE src(key INT, value STRING)") 
sc.runSql("LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src") 

// print result to stdout 
println(sc.runSql("select * from src")) 
println(sc.runSql("select count(*) from src")) 

另外,儘量查詢從SRC數據表(註釋行中有「select count(*)...」)沒有聚合函數,當數據查詢正常時我有類似問題,但count(*)拋出異常,通過添加mysql-connector-java.jaryarn.application.classpath在我的情況。