2015-11-11 31 views
1

我在hbase中有一個名爲「sample」的表。我需要使用Apache spark-sql查詢來查詢表。 有什麼方法可以使用Apache spark-sql查詢來讀取hbase數據嗎?使用spark-sql的Hbase

回答

6

星火SQL是在內存中的查詢引擎,利用星火SQL對HBase的桌子上執行一些查詢操作,你需要

  1. 使用星火從HBase的獲取數據,並創建星火RDD

    SparkConf sparkConf = new SparkConf(); 
    sparkConf.setAppName("SparkApp"); 
    sparkConf.setMaster("local[*]"); 
    
    JavaSparkContext javaSparkContext = new JavaSparkContext(sparkConf); 
    
    Configuration config = HBaseConfiguration.create(); 
    config.addResource(new Path("/etc/hbase/hbase-site.xml")); 
    config.addResource(new Path("/etc/hadoop/core-site.xml")); 
    config.set(TableInputFormat.INPUT_TABLE, "sample"); 
    
    JavaPairRDD<ImmutableBytesWritable, Result> hbaseRDD = javaSparkContext.newAPIHadoopRDD(hbaseConfig, TableInputFormat.class, ImmutableBytesWritable.class, Result.class); 
    
    JavaRDD<StudentBean> sampleRDD = hbaseRDD.map(new Function<Tuple2<ImmutableBytesWritable,Result>, StudentBean >() { 
        private static final long serialVersionUID = -2021713021648730786L; 
        public StudentBean call(Tuple2<ImmutableBytesWritable, Result> tuple) { 
         StudentBean bean = new StudentBean (); 
         Result result = tuple._2; 
         bean.setRowKey(rowKey); 
         bean.setFirstName(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("firstName")))); 
         bean.setLastName(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("lastName")))); 
         bean.setBranch(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("branch")))); 
         bean.setEmailId(Bytes.toString(result.getValue(Bytes.toBytes("details"), Bytes.toBytes("emailId")))); 
         return bean; 
        } 
    }); 
    
  2. 通過使用此RDD創建數據框對象,並使用一些臨時表的名字註冊這一點,那麼你就可以執行查詢

    DataFrame schema = sqlContext.createDataFrame(sampleRDD, StudentBean.class); 
    schema.registerTempTable("spark_sql_temp_table"); 
    
    DataFrame schemaRDD = sqlContext.sql("YOUR_QUERY_GOES_HERE"); 
    
    JavaRDD<StudentBean> result = schemaRDD.toJavaRDD().map(new Function<Row, StudentBean>() { 
    
        private static final long serialVersionUID = -2558736294883522519L; 
    
        public StudentBean call(Row row) throws Exception { 
         StudentBean bean = new StudentBean(); 
         // Do the mapping stuff here 
         return bean; 
        } 
    });