Scaling Batch Processing with Spring Batch: A Practical Guide to Big Data & Thread Management

Processing millions of records is a common challenge in enterprise systems—from financial settlements to ETL pipelines and analytics workloads. Spring Batch provides a powerful, production-ready framework for building scalable and fault-tolerant batch jobs.

In this article, I’ll explain how Spring Batch handles large data sets, how to manage threads efficiently, and show a practical real-world example.

🔹 Spring Batch Architecture in Brief

Spring Batch’s core structure:

Job → top-level container
Step → reader → processor → writer
Chunk-Oriented Processing → processes small groups of items per transaction

This makes large-scale data processing stable, memory-efficient, and fault-tolerant.

🔹 Real Example: Processing 10 Million Records with Multi-Threading

Imagine you have a table with 10 million transactions, and you want to:

read the data in a streaming manner
process them
write results
run all of this using multiple threads to speed up execution

Below is a real Spring Batch example.

Step 1 – Thread Pool Configuration

@Bean
public TaskExecutor taskExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(4);
    executor.setMaxPoolSize(8);
    executor.setQueueCapacity(2000);
    executor.setThreadNamePrefix("batch-worker-");
    executor.initialize();
    return executor;
}

This enables Spring Batch to process records in parallel.

Step 2 – Streaming Reader (Paging for Big Data)

@Bean
public JdbcPagingItemReader<Transaction> reader(DataSource dataSource) {
    JdbcPagingItemReader<Transaction> reader = new JdbcPagingItemReader<>();

    reader.setDataSource(dataSource);
    reader.setPageSize(1000);

    MySqlPagingQueryProvider queryProvider = new MySqlPagingQueryProvider();
    queryProvider.setSelectClause("SELECT id, amount, created_at");
    queryProvider.setFromClause("FROM transaction");
    queryProvider.setSortKeys(Collections.singletonMap("id", Order.ASCENDING));

    reader.setQueryProvider(queryProvider);
    reader.setRowMapper(new TransactionRowMapper());
    return reader;
}

Streaming ensures we never load all rows into memory.

Step 3 – Define the Multi-Threaded Step

@Bean
public Step processStep() {
    return stepBuilderFactory.get("processStep")
        .<Transaction, ProcessedTransaction>chunk(1000)
        .reader(reader(null))
        .processor(new TransactionProcessor())
        .writer(new TransactionWriter())
        .taskExecutor(taskExecutor())
        .throttleLimit(8) // maximum parallel threads
        .build();
}

Chunk size 1000 is optimal for multi-million record workloads (balanced performance/memory).

What This Achieves

Using the example above:

4–8 parallel threads increase throughput
database reads are paged, not loaded into memory
each chunk is fully transactional
throughput on 10M rows can improve 3–5x depending on hardware

Why This Matters

Many teams run batch jobs that take hours—often because they are processed sequentially or load too much data into memory.

With Spring Batch:

multi-threading
chunk processing
database paging
fault tolerance

you can turn multi-hour jobs into minutes...

Final Thoughts

Spring Batch continues to be a reliable choice for large-scale enterprise workloads. With proper tuning—like multi-threaded steps, partitioning, and the right chunk size—you can safely process millions of records with excellent performance

Scaling Batch Processing with Spring Batch: A Practical Guide to Big Data & Thread Management

mahmoud abbasi

🔹 Spring Batch Architecture in Brief

🔹 Real Example: Processing 10 Million Records with Multi-Threading

Step 1 – Thread Pool Configuration

Step 2 – Streaming Reader (Paging for Big Data)

Step 3 – Define the Multi-Threaded Step

What This Achieves

Why This Matters

Final Thoughts

More articles by mahmoud abbasi

Explore content categories

🔹 Spring Batch Architecture in Brief

🔹 Real Example: Processing 10 Million Records with Multi-Threading

Step 1 – Thread Pool Configuration

Step 2 – Streaming Reader (Paging for Big Data)

Step 3 – Define the Multi-Threaded Step

What This Achieves

Why This Matters

Final Thoughts

More articles by mahmoud abbasi

Understanding JVM Internals – Beyond Just Running Java Code

Scaling Local AI: Using Ollama and Kafka to Build Private, Real-Time LLM Services

Running Large Language Models Locally and Exposing Them Over IP with Ollama

Designing Priority-Based Rate Limiting: When Premium Users Can’t Be Throttled

Choosing the Right Caching Solution in Java: A Quick Guide

Datasource Settings for Connection Pooling

Understanding Design Patterns: Building Smarter and Scalable Software

Why SOLID Matters

Java Generic Symbols: T, E, K, V, ? — A Complete and Practical Guide

Stop Killing Your Database Throughput with UUIDs – Across SQL and NoSQL

Explore content categories