2017-09-15 37 views
1

我的設置是在AWS中運行的3節點羣集。我已經攝入了我的數據(30行),使用jupyter筆記本運行查詢時沒有任何問題。但是現在我試圖使用spark和java來運行查詢,如下面的代碼片段所示。Geomesa + SparkSQL集成問題

public class SparkSqlTest { 

    private static final Logger log = Logger.getLogger(SparkSqlTest.class); 


    public static void main(String[] args) { 
     Map<String, String> dsParams = new HashMap<>(); 
     dsParams.put("instanceId", "gis"); 
     dsParams.put("zookeepers", "server ip"); 
     dsParams.put("user", "root"); 
     dsParams.put("password", "secret"); 
     dsParams.put("tableName", "posiciones"); 

     try { 
      DataStoreFinder.getDataStore(dsParams); 
      SparkConf conf = new SparkConf(); 
      conf.setAppName("testSpark"); 
      conf.setMaster("yarn"); 
      SparkContext sc = SparkContext.getOrCreate(conf); 
      SparkSession ss = SparkSession.builder().config(conf).getOrCreate(); 

      Dataset<Row> df = ss.read() 
       .format("geomesa") 
       .options(dsParams) 
       .option("geomesa.feature", "posicion") 
       .load(); 
      df.createOrReplaceTempView("posiciones"); 

      long t1 = System.currentTimeMillis(); 
      Dataset<Row> rows = ss.sql("select count(*) from posiciones where id_equipo = 148 and fecha_hora >= '2015-04-01' and fecha_hora <= '2015-04-30'"); 
      long t2 = System.currentTimeMillis(); 
      rows.show(); 

      log.info("Tiempo de la consulta: " + ((t2 - t1)/1000) + " segundos."); 
     } catch (Exception e) { 
      e.printStackTrace(); 
     } 
    } 
} 

我上傳我的主人EC2框代碼(jupyter notebook image內),並使用以下命令運行它:

docker cp myjar-0.1.0.jar jupyter:myjar-0.1.0.jar 
docker exec jupyter sh -c '$SPARK_HOME/bin/spark-submit --master yarn --class mypackage.SparkSqlTest file:///myjar-0.1.0.jar --jars $GEOMESA_SPARK_JARS' 

但我得到了以下錯誤:

17/09/15 19:45:01 INFO HSQLDB4AD417742A.ENGINE: dataFileCache open start 
17/09/15 19:45:02 INFO execution.SparkSqlParser: Parsing command: posiciones 
17/09/15 19:45:02 INFO execution.SparkSqlParser: Parsing command: select count(*) from posiciones where id_equipo = 148 and fecha_hora >= '2015-04-01' and fecha_hora <= '2015-04-30' 
java.lang.RuntimeException: Could not find a SpatialRDDProvider 
at org.locationtech.geomesa.spark.GeoMesaSpark$$anonfun$apply$2.apply(GeoMesaSpark.scala:33) 
at org.locationtech.geomesa.spark.GeoMesaSpark$$anonfun$apply$2.apply(GeoMesaSpark.scala:33) 

任何想法爲什麼發生這種情況?

+0

所有罐子做什麼$ GEOMESA_SPARK_JARS包括以下項目?如果它不包含geomesa-accumulo-spark-runtime_2.11 - $ {version} .jar,那麼可以解釋這個問題。 – GeoMesaJim

+0

哦,另一個建議/問題是檢查「DataStoreFinder.getDataStore(dsParams);」的返回值。 如果GeoMesa AccumuloDataStore不在類路徑中,那麼該行將很高興地要求'null'。 – GeoMesaJim

+0

這是$ GEOMESA_SPARK_JARS文件的值:///opt/geomesa/dist/spark/geomesa-accumulo-spark-runtime_2.11-1.3.2.jar,file:/// opt/geomesa/dist/spark/geomesa-spark-converter_2.11-1.3.2.jar,file:///opt/geomesa/dist/spark/geomesa-spark-geotools_2.11-1.3.2.jar – jramirez

回答

1

我終於整理出來,我的問題是,我並沒有包括在我的pom.xml

<dependency> 
     <groupId>org.locationtech.geomesa</groupId> 
     <artifactId>geomesa-accumulo-spark_2.11</artifactId> 
     <version>${geomesa.version}</version> 
    </dependency> 

    <dependency> 
     <groupId>org.locationtech.geomesa</groupId> 
     <artifactId>geomesa-spark-converter_2.11</artifactId> 
     <version>${geomesa.version}</version> 
    </dependency>