2012-11-21 32 views
1

我正在編寫一個使用Spring Batch處理來自MySQL數據庫表的7,637,064行的程序。我已經用較小的表獲得成功,但當JdbcCursorItemReader試圖打開遊標時,此表中的大量行會導致OutOfMemoryError異常。Spring批處理 - JdbcCursorItemReader使用大型MySQL表拋出OutOfMemoryError

我可以通過拋出一個更大的Xmx來解決這個問題,但在我看來,Spring Batch應該有一種方法來處理這個問題,而且我可能只是缺少一個關鍵的配置。

Spring Batch的配置:

<job id="reportJob" xmlns="http://www.springframework.org/schema/batch"> 
    <step id="largeTableTransfer"> 
     <tasklet> 
     <chunk reader="largeTableReader" processor="largeTableTransformer" writer="largeTableWriter" 
      commit-interval="10" /> 
     </tasklet> 
    </step> 
    </job> 

    <bean id="largeTableReader" class="org.springframework.batch.item.database.JdbcCursorItemReader"> 
    <property name="dataSource" ref="inputDataSource" /> 
    <property name="sql" value="select * from largeTable" /> 
    <property name="rowMapper"> 
     <bean class="myproject.reader.largeTableRowMapper" /> 
    </property> 
    </bean> 
    <bean id="largeTableTransformer" class="myproject.transformer.LargeTableTransformer" /> 
    <bean id="largeTableWriter" class="myproject.writer.JdbcLargeTableWriter"> 
    <property name="dataSource" ref="outputDataSource" /> 
    </bean> 

在JdbcCursorItemReader設置FETCHSIZE似乎沒有任何效果。允許它運行完成的唯一方法是將maxRows設置爲一個很小的數字,但只處理該數量的行。

相關的堆棧跟蹤:

2012-11-21 11:25:29,931 DEBUG [org.springframework.batch.core.repository.dao.JdbcStepExecutionDao] - <Truncating long message before update of StepExecution, original message is: java.lang.OutOfMemoryError: Java heap space 
    at java.util.Arrays.copyOf(Arrays.java:2734) 
    at java.util.ArrayList.ensureCapacity(ArrayList.java:167) 
    at java.util.ArrayList.add(ArrayList.java:351) 
    at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2821) 
    at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:467) 
    at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2510) 
    at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1746) 
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2135) 
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2542) 
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1734) 
    at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1885) 
    at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93) 
    at org.springframework.batch.item.database.JdbcCursorItemReader.openCursor(JdbcCursorItemReader.java:125) 
    at org.springframework.batch.item.database.AbstractCursorItemReader.doOpen(AbstractCursorItemReader.java:401) 
    at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.open(AbstractItemCountingItemStreamItemReader.java:134) 
    at org.springframework.batch.item.support.CompositeItemStream.open(CompositeItemStream.java:93) 
    at org.springframework.batch.core.step.tasklet.TaskletStep.open(TaskletStep.java:301) 
    at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:192) 
    at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) 
    at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) 
    at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) 
    at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) 
    at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) 
    at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) 
    at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) 
    at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) 
    at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:48) 
    at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:114) 
    at ifpress.ams2amx.ExampleJobConfigurationTests.testLaunchJob(ExampleJobConfigurationTests.java:32) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 

回答

2

有一個在MySQL JDBC驅動程序的問題,這將導致整個數據集到內存中加載不管已傳遞到語句創建方法的參數。有關如何正確打開遊標,請參閱http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html

這是我如何做到這一點:

<bean class="org.springframework.batch.item.database.JdbcCursorItemReader"> 
    <property name="verifyCursorPosition" value="false" /> 
    <property name="dataSource" ref="remoteDataSource" /> 
    <property name="rowMapper"> 
    <bean class="org.springframework.jdbc.core.SingleColumnRowMapper" /> 
    </property> 
    <property name="fetchSize"> 
    <util:constant static-field="java.lang.Integer.MIN_VALUE" /> 
    </property> 
    <property name="sql"> 
    <value>SELECT foo, bar FROM baz</value> 
    </property> 
</bean> 
+0

謝謝,這幫助我理解這個問題,但是通過使用這些設置,現在我得到以下錯誤:「流結果集com.mysql.jdbc。 RowDataDynamic @ 451710be仍處於活動狀態,當任何流式結果集打開並在給定連接上使用時,不會發布任何語句。確保在嘗試進行更多查詢之前,您已在任何活動流式結果集上調用了.close()。你有沒有見過這個? –

+0

我已經看過幾次,但我不記得爲什麼會發生這種情況。如果您可以提供更多信息,例如上下文來源或完整的日誌輸出,我可能會回想起問題的根源。 –

+0

我不得不通過這個過程,我解決了我的特殊情況,將其分解爲多個作業,每個作業處理一定數量的行。從配置的角度來看更麻煩,但具有允許多個作業並行運行(如果可用內存允許的話)的額外好處。再次感謝你的幫助。 –

相關問題