2017-04-15 26 views
0

請看下面的代碼。 當我傳遞分區數值時,下面的代碼出現錯誤。在我的火花應用程序中對未完全指定的錯誤進行分區

 def loadDataFromPostgress(sqlContext: SQLContext, tableName: String, 
     columnName: String, dbURL: String, userName: String, pwd: String, 
     partitions: String): DataFrame = { 
     println("the no of partitions are : "+partitions) 
     var dataDF = sqlContext.read.format("jdbc").options(
     scala.collection.Map("url" -> dbURL, 
          "dbtable" -> tableName, 
         "driver" -> "org.postgresql.Driver", 
        "user" -> userName, 
       "password" -> pwd, 
        "partitionColumn" -> columnName, 
       "numPartitions" -> "1000")).load() 
       return dataDF 
         } 

錯誤:

   java.lang.RuntimeException: Partitioning incompletely specified 
       App > at scala.sys.package$.error(package.scala:27) 
       App > at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:38) 
       App > at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:315) 
       App > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) 
    App > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122) 
       App > at Test$.loadDataFromGreenPlum(script.scala:28) 
       App > at Test$.loadDataFrame(script.scala:15) 
       App > at Test$.main(script.scala:59) 
       App > at Test.main(script.scala) 
       App > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
        Method) 
       App > at 

回答

3

你可以檢查下面的代碼你可以如何使用。

def loadDataFromPostgress(sqlContext: SQLContext, tableName: String, 
          columnName: String, dbURL: String, userName: String, 
          pwd: String, partitions: String): DataFrame = { 
    println("the no of partitions are : " + partitions) 
    var dataDF = sqlContext.read.format("jdbc").options(
     scala.collection.Map("url" -> dbURL, 
     "dbtable" -> "(select mod(tmp.empid,10) as hash_code,tmp.* from employee as tmp) as t", 
     "driver" -> "org.postgresql.Driver", 
     "user" -> userName, 
     "password" -> pwd, 
     "partitionColumn" -> hash_code, 
     "lowerBound" -> 0, 
     "upperBound" -> 10 
    "numPartitions" -> "10" 
    )).load() 
    return dataDF 
    } 

上面的代碼將創建10個任務,包含10個查詢,如下所示。 在此之前的工作將找出

offset = (upperBound-lowerBound)/numPartitions

這裏offset = (10-0)/10 = 1

select mod(tmp.empid,10) as hash_code,tmp.* from employee as tmp where hash_code between 0 between 1 
select mod(tmp.empid,10) as hash_code,tmp.* from employee as tmp where hash_code between 1 between 2 
. 
. 
select mod(tmp.empid,10) as hash_code,tmp.* from employee as tmp where hash_code between 9 between 10 

這將創建10個分區和

EMPID 0將去一個分區作爲MOD(EMPID,10)總是結束等於0

empid以1結尾將作爲mod(empid,10)將一個分區轉爲等於1

像這樣,所有員工行將被分成10個分區。

您必須根據您的要求更改partitionColumn,upperBound,lowerBound,numPartitions值。

希望我的回答能幫助你。

0

分區需要:

  • 分區列(整數)。
  • 列數
  • 的下界列
  • 上限列

最後兩人失蹤,這就是爲什麼你的錯誤。

相關問題