如何使用在Scala中創建數據集的通用案例類實現特徵

我想創建一個應該用案例類T實現的Scala特徵。特徵僅僅是加載數據並將其轉換爲Spark數據集類型T.我得到的錯誤，沒有編碼器可以存儲，我認爲這是因爲斯卡拉不知道T應該是一個案例類。我怎樣才能告訴編譯器？我曾經見過某處應該提及產品，但是沒有定義這樣的類。請隨意建議其他方法來執行此操作！如何使用在Scala中創建數據集的通用案例類實現特徵

我有以下代碼，但它不會編譯錯誤：42：錯誤：無法找到存儲在數據集中的類型的編碼器。原語類型（int，字符串等）和產品類型（case類）通過導入sqlContext.implicits._ [INFO]。如[T]

我使用火花1.6.1

支持代碼：

import org.apache.spark.{SparkConf, SparkContext} 
import org.apache.spark.sql.{Dataset, SQLContext}  

/** 
     * A trait that moves data on Hadoop with Spark based on the location and the granularity of the data. 
     */ 
    trait Agent[T] { 
     /** 
     * Load a Dataframe from the location and convert into a Dataset 
     * @return Dataset[T] 
     */ 
     protected def load(): Dataset[T] = { 
     // Read in the data 
     SparkContextKeeper.sqlContext.read 
      .format("com.databricks.spark.csv") 
      .option("header", header) // Use first line of all files as header 
      .option("inferSchema", "true") // Automatically infer data types 
      .option("delimiter", "|") // Deloitte always expects pipe as a delimiter 
      .option("dateFormat","yyyy-MM-dd") // Deloitte always expects this kind of Date format 
      .load("/iacc/eandis/landing/raw/" + location + "/2016/10/01/") 
      .as[T] 
     } 
    }

來源

2016-11-10 Sparky

http://stackoverflow.com/questions/34715611/why-is-the-error-unable-to-find-encoder-for-type-stored-in-a-dataset-when-enco和http://stackoverflow.com/questions/38664972/why-is-unable-to-find-encoder-for-type-stored-in-a-dataset-when-creating-a-dat – Shankar

您的代碼缺少三樣東西：

事實上，你必須讓編譯器知道，T是Product子
編譯器（所有Scala的case類和元組的超）還需要實際案例類的TypeTag和ClassTag。這是隱含使用星火克服類型擦除sqlContext.implicits._

進口不幸的是，語境中特質界定不能添加類型參數，因此最簡單的解決方法是使用一個abstract class代替：

import scala.reflect.runtime.universe.TypeTag 
import scala.reflect.ClassTag 

abstract class Agent[T <: Product : ClassTag : TypeTag] { 
    protected def load(): Dataset[T] = { 
    val sqlContext: SQLContext = SparkContextKeeper.sqlContext 
    import sqlContext.implicits._ 
    sqlContext.read.// same... 
    } 
}

顯然，這並不等同於使用特徵，並可能表明，這個設計是不是做這項工作最合適的。另一種方法是將load在對象和類型參數移到方法：

object Agent { 
    protected def load[T <: Product : ClassTag : TypeTag](): Dataset[T] = { 
    // same... 
    } 
}

哪一個是最好是最新到你要去的地方以及如何調用load和您的計劃與結果做。

來源

2016-11-10 16:03:31

感謝您的解決方案，有用的還評論設計！我已經嘗試了你的第一個和第三個評論，但第二個我需要做的伎倆:) 順便說一句，我認爲抽象類也沒問題，因爲我將繼承我創建的每個數據集的代理。 – Sparky

你需要採取兩個動作：

在進口添加import sparkSession.implicits._
讓你[R特質trait Agent[T <: Product]

來源

2016-11-10 16:01:28 C4stor

如何使用在Scala中創建數據集的通用案例類實現特徵

回答

相關問題