2017-09-26 24 views
0

我有一個Spark代碼發送請求到DynamoDB。用於與數據庫建立連接的AmazonDynamoDBClient不可序列化。如何在Spark中測試不可序列化的代碼

所以我在斯卡拉使這一類的實例mapPartition方法中這樣的 - >

recordsToWrite.mapPartitions { iter => 
    var credentials = new BasicAWSCredentials(awsAccess, awsSecret) 
    var client= new AmazonDynamoDBClient(credentials) 
    var dynamoDB=new DynamoDB(client) 
    var optTable=dynamoDB.getTable(tableName) 
    iter.map { x => 
    //some code.... 
    optTable.updateItem(x) 
    } 
} 

的問題是我想測試此代碼與當地的火花(火花試驗基地)和dynamodb在單元測試中。

我不能拿AmazonDynamoDBClientmapPartition因爲它不是序列化(異常是由火花拋出)

回答

1

您可以創建一個DynamoDBFactory特點,其序列化,有兩種實現方式,一種「真正的」一個和「測試」一個(我假設的問題是如何「注入」的測試客戶端):

trait DynamoDBFactory extends Serializable { 
    def createClient(awsAccess: String, awsSecret: String): DynamoDB 
} 

class RealDynamoDBFactory extends DynamoDBFactory { 
    def createClient(awsAccess: String, awsSecret: String): DynamoDB = { 
    var credentials = new BasicAWSCredentials(awsAccess, awsSecret) 
    var client= new AmazonDynamoDBClient(credentials) 
    new DynamoDB(client) 
    } 
} 

class TestDynamoDBFactory extends DynamoDBFactory { 
    def createClient(awsAccess: String, awsSecret: String): DynamoDB = { 
    // return your test stub/mock/whatever you need 
    } 
} 

然後,有你的測試代碼期望的DynamoDBFactory一個實例,並通過正確的實例在測試/生產代碼:

val dynamoDBFactory: DynamoDBFactory = // ...get it from caller 
recordsToWrite.mapPartitions { iter => 
    var dynamoDB = dynamoDBFactory.createClient(awsAccess, awsSecret) 
    var optTable=dynamoDB.getTable(tableName) 
    iter.map { x => 
    //some code.... 
    optTable.updateItem(x) 
    } 
} 
+0

不得不做小的修改,但它的工作。謝謝 :) – cmbendre

相關問題