如何在Spark中測試不可序列化的代碼

我有一個Spark代碼發送請求到DynamoDB。用於與數據庫建立連接的AmazonDynamoDBClient不可序列化。如何在Spark中測試不可序列化的代碼

所以我在斯卡拉使這一類的實例mapPartition方法中這樣的 - >

recordsToWrite.mapPartitions { iter => 
    var credentials = new BasicAWSCredentials(awsAccess, awsSecret) 
    var client= new AmazonDynamoDBClient(credentials) 
    var dynamoDB=new DynamoDB(client) 
    var optTable=dynamoDB.getTable(tableName) 
    iter.map { x => 
    //some code.... 
    optTable.updateItem(x) 
    } 
}

的問題是我想測試此代碼與當地的火花（火花試驗基地）和dynamodb在單元測試中。

我不能拿AmazonDynamoDBClient出mapPartition因爲它不是序列化（異常是由火花拋出）

來源

2017-09-26 cmbendre

您可以創建一個DynamoDBFactory特點，其是序列化，有兩種實現方式，一種「真正的」一個和「測試」一個（我假設的問題是如何「注入」的測試客戶端）：

trait DynamoDBFactory extends Serializable { 
    def createClient(awsAccess: String, awsSecret: String): DynamoDB 
} 

class RealDynamoDBFactory extends DynamoDBFactory { 
    def createClient(awsAccess: String, awsSecret: String): DynamoDB = { 
    var credentials = new BasicAWSCredentials(awsAccess, awsSecret) 
    var client= new AmazonDynamoDBClient(credentials) 
    new DynamoDB(client) 
    } 
} 

class TestDynamoDBFactory extends DynamoDBFactory { 
    def createClient(awsAccess: String, awsSecret: String): DynamoDB = { 
    // return your test stub/mock/whatever you need 
    } 
}

然後，有你的測試代碼期望的DynamoDBFactory一個實例，並通過正確的實例在測試/生產代碼：

val dynamoDBFactory: DynamoDBFactory = // ...get it from caller 
recordsToWrite.mapPartitions { iter => 
    var dynamoDB = dynamoDBFactory.createClient(awsAccess, awsSecret) 
    var optTable=dynamoDB.getTable(tableName) 
    iter.map { x => 
    //some code.... 
    optTable.updateItem(x) 
    } 
}

來源

2017-09-26 14:07:37

不得不做小的修改，但它的工作。謝謝：） – cmbendre

如何在Spark中測試不可序列化的代碼

回答

相關問題