Scaling Batch Processing with Spring Batch: A Practical Guide to Big Data & Thread Management
Processing millions of records is a common challenge in enterprise systems—from financial settlements to ETL pipelines and analytics workloads. Spring Batch provides a powerful, production-ready framework for building scalable and fault-tolerant batch jobs.
In this article, I’ll explain how Spring Batch handles large data sets, how to manage threads efficiently, and show a practical real-world example.
🔹 Spring Batch Architecture in Brief
Spring Batch’s core structure:
This makes large-scale data processing stable, memory-efficient, and fault-tolerant.
🔹 Real Example: Processing 10 Million Records with Multi-Threading
Imagine you have a table with 10 million transactions, and you want to:
Below is a real Spring Batch example.
Step 1 – Thread Pool Configuration
@Bean
public TaskExecutor taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(4);
executor.setMaxPoolSize(8);
executor.setQueueCapacity(2000);
executor.setThreadNamePrefix("batch-worker-");
executor.initialize();
return executor;
}
This enables Spring Batch to process records in parallel.
Step 2 – Streaming Reader (Paging for Big Data)
@Bean
public JdbcPagingItemReader<Transaction> reader(DataSource dataSource) {
JdbcPagingItemReader<Transaction> reader = new JdbcPagingItemReader<>();
reader.setDataSource(dataSource);
reader.setPageSize(1000);
MySqlPagingQueryProvider queryProvider = new MySqlPagingQueryProvider();
queryProvider.setSelectClause("SELECT id, amount, created_at");
queryProvider.setFromClause("FROM transaction");
queryProvider.setSortKeys(Collections.singletonMap("id", Order.ASCENDING));
reader.setQueryProvider(queryProvider);
reader.setRowMapper(new TransactionRowMapper());
return reader;
}
Streaming ensures we never load all rows into memory.
Step 3 – Define the Multi-Threaded Step
@Bean
public Step processStep() {
return stepBuilderFactory.get("processStep")
.<Transaction, ProcessedTransaction>chunk(1000)
.reader(reader(null))
.processor(new TransactionProcessor())
.writer(new TransactionWriter())
.taskExecutor(taskExecutor())
.throttleLimit(8) // maximum parallel threads
.build();
}
Chunk size 1000 is optimal for multi-million record workloads (balanced performance/memory).
What This Achieves
Using the example above:
Why This Matters
Many teams run batch jobs that take hours—often because they are processed sequentially or load too much data into memory.
With Spring Batch:
you can turn multi-hour jobs into minutes...
Final Thoughts
Spring Batch continues to be a reliable choice for large-scale enterprise workloads. With proper tuning—like multi-threaded steps, partitioning, and the right chunk size—you can safely process millions of records with excellent performance