Introduction

In this tutorial we will see how Spring Batch works by an example.

What is Spring Batch

Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems.

Features of Spring Batch

  • Spring Batch is a lightweight, comprehensive batch framework
  • It is designed to enable the development of robust batch applications
  • It builds on the productivity, POJO-based development approach
  • Spring Batch is not a scheduling framework
  • It is intended to work in conjunction with a scheduler but not a replacement for a scheduler.

Usages of Spring Batch

  • used to perform business operations in mission critical environments
  • used to automate the complex processing of large volume of data without user interaction
  • processes the time-based events, periodic repetitive complex processing for a large data sets
  • used to integrate the internal/external information that requires formatting, validation and processing in a transactional manner
  • used to process the parallel jobs or concurrent jobs
  • provide the functionality for manual or scheduled restart after failure

Guidelines to use Spring Batch

  • avoid building complex logical structures in a single batch application
  • keep your data close to where the batch processing occurs
  • minimize the system resource use like I/O by performing operations in internal memory wherever possible
  • cache the data after first read from database for every transaction and read cache data from next time onwards
  • avoid unnecessary scan for table or index in database
  • be specific to retrieve the data from database, i.e., retrieve the required fields only, specify WHERE clause in the SQL statement etc.
  • avoid performing the same thing multiple times in a batch processing
  • allocate enough memory before batch process starts because reallocating memory is a time-consuming matter during the batch process
  • be consistent to check and validate the data to maintain the data integrity
  • Implement check-sums for internal validation wherever possible
  • stress test should be executed at early stage for production-like environments

For more information on Theoretical parts please go to http://docs.spring.io/spring-batch/trunk/reference/html/spring-batch-intro.html and http://spring.io/guides/gs/batch-processing/

Prerequisites

Java 8, Spring Boot 2.1.4, Gradle 4.10.2

Example with Source Code

Now we will see an example how it works.

We’ll build a service that imports data from a CSV spreadsheet, transforms it with custom code, and stores the final results in another CSV spreadsheet. You can also store data in database or any persistence storage.

Creating Project

Create gradle based project in Eclipse IDE and you will see the required project structure gets created.

Updating Build Script

Modify build.gradle file to add required dependencies so that it looks like below. It downloads all jars from maven repository.

buildscript {
	ext {
		springBootVersion = '2.1.4.RELEASE'
	}
    repositories {
    	mavenLocal()
    	mavenCentral()
    }
    dependencies {
    	classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
    }
}
apply plugin: 'java'
apply plugin: 'org.springframework.boot'
sourceCompatibility = 1.8
targetCompatibility = 1.8
repositories {
	mavenLocal()
    mavenCentral()
}
dependencies {
	compile("org.springframework.boot:spring-boot-starter-batch:${springBootVersion}")
	runtime("com.h2database:h2:1.4.197")
}

In the above build script, we have added H2 database as a runtime dependency because it is required by Spring Batch to process the data. You can use any database, such as, MySQL, Oracle, Derby etc.

Related Posts:

Creating VO Class

Create a business class User.java which will represent a row of data for inputs and outputs. You can instantiate the User class either with name and email through a constructor, or by setting the properties.

package com.roytuts.spring.batch.vo;
public class User {
	private String name;
	private String email;
	public User() {
	}
	public User(String name, String email) {
		this.name = name;
		this.email = email;
	}
	public String getName() {
		return name;
	}
	public void setName(String name) {
		this.name = name;
	}
	public String getEmail() {
		return email;
	}
	public void setEmail(String email) {
		this.email = email;
	}
	@Override
	public String toString() {
		return "name: " + name + ", email:" + email;
	}
}

Creating ItemProcessor

Create an intermediate processor. A common paradigm in batch processing is to ingest data, transform it, and then pipe it out somewhere else.

Here we write a simple transformer that converts the names to uppercase and changes the email domain.

You can implement your own business as per your needs for the application.

package com.roytuts.spring.batch.itemprocessor;
import org.springframework.batch.item.ItemProcessor;
import com.roytuts.spring.batch.vo.User;
public class UserItemProcessor implements ItemProcessor<User, User> {
	@Override
	public User process(final User user) throws Exception {
		final String domain = "roytuts.com";
		final String name = user.getName().toUpperCase();
		final String email = user.getEmail().substring(0, user.getEmail().indexOf("@") + 1) + domain;
		final User transformedUser = new User(name, email);
		System.out.println("Converting [" + user + "] => [" + transformedUser + "]");
		return transformedUser;
	}
}

UserItemProcessor implements Spring Batch’s ItemProcessor interface. This makes it easy to wire the code into a batch job that we define further down in this guide.

According to the interface, we receive an incoming User object, after which we transform name to an upper-cased name and we replace the email domain by roytuts.com in User object.

Creating FieldSetMapper Class

The FieldSetMapper helps to map field or value to object.

package com.roytuts.spring.batch.fieldset.mapper;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.boot.context.properties.bind.BindException;
import com.roytuts.spring.batch.vo.User;
public class UserFieldSetMapper implements FieldSetMapper<User> {
	@Override
	public User mapFieldSet(FieldSet fieldSet) throws BindException {
		User user = new User();
		user.setName(fieldSet.readString(0));
		user.setEmail(fieldSet.readString(1));
		return user;
	}
}

Creating Spring Batch Configuration

Now we will write a batch job. We use annotation @EnableBatchProcessing for enabling memory-based batch processing meaning when processing is done, the data is gone.

I have written comments on each bean and statements so it will be easier to know what it does.

package com.roytuts.spring.batch.config;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.FlatFileItemWriter;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor;
import org.springframework.batch.item.file.transform.DelimitedLineAggregator;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.core.io.FileSystemResource;
import com.roytuts.spring.batch.fieldset.mapper.UserFieldSetMapper;
import com.roytuts.spring.batch.itemprocessor.UserItemProcessor;
import com.roytuts.spring.batch.vo.User;
@Configuration
@EnableBatchProcessing
public class SpringBatchConfig {
	@Bean
	// creates an item reader
	public ItemReader<User> reader() {
		FlatFileItemReader<User> reader = new FlatFileItemReader<User>();
		// look for file user.csv
		reader.setResource(new ClassPathResource("user.csv"));
		// line mapper
		DefaultLineMapper<User> lineMapper = new DefaultLineMapper<User>();
		// each line with comma separated
		lineMapper.setLineTokenizer(new DelimitedLineTokenizer());
		// map file's field with object
		lineMapper.setFieldSetMapper(new UserFieldSetMapper());
		reader.setLineMapper(lineMapper);
		return reader;
	}
	@Bean
	// creates an instance of our UserItemProcessor for transformation
	public ItemProcessor<User, User> processor() {
		return new UserItemProcessor();
	}
	@Bean
	// creates item writer
	public ItemWriter<User> writer() {
		FlatFileItemWriter<User> writer = new FlatFileItemWriter<User>();
		// output file path
		writer.setResource(new FileSystemResource("C:/workspace/transformed_user.csv"));
		// delete if the file already exists
		writer.setShouldDeleteIfExists(true);
		// create lines for writing to file
		DelimitedLineAggregator<User> lineAggregator = new DelimitedLineAggregator<User>();
		// delimit field by comma
		lineAggregator.setDelimiter(",");
		// extract field from ItemReader
		BeanWrapperFieldExtractor<User> fieldExtractor = new BeanWrapperFieldExtractor<User>();
		// use User object's properties
		fieldExtractor.setNames(new String[] { "name", "email" });
		lineAggregator.setFieldExtractor(fieldExtractor);
		// write whole data
		writer.setLineAggregator(lineAggregator);
		return writer;
	}
	@Bean
	// define job which is built from step
	public Job importUserJob(JobBuilderFactory jobs, Step step) {
		// need incrementer to maintain execution state
		return jobs.get("importUserJob").incrementer(new RunIdIncrementer()).flow(step).end().build();
	}
	@Bean
	// define step
	public Step step1(StepBuilderFactory stepBuilderFactory, ItemReader<User> reader, ItemWriter<User> writer,
			ItemProcessor<User, User> processor) {
		// chunk uses how much data to write at a time
		// In this case, it writes up to five records at a time.
		// Next, we configure the reader, processor, and writer
		return stepBuilderFactory.get("step1").<User, User>chunk(5).reader(reader).processor(processor).writer(writer)
				.build();
	}
}

Creating Main Class

This batch processing can be embedded in web apps also but in this Spring Boot example we will create a main class to run the application. You can also create an executable jar from it.

package com.roytuts.spring.batch;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication(scanBasePackages = "com.roytuts.spring.batch")
public class SpringBatch {
	public static void main(String[] args) {
		SpringApplication.run(SpringBatch.class, args);
	}
}

Testing the Application

Run the above main class, you will see the following output in the console.

Input csv file can be found here below:

You will also get the output file -> C:/workspace/transformed_user.csv.

Converting [name: soumitra, email:[email protected]] => [name: SOUMITRA, email:[email protected]]
Converting [name: soumitra, email:[email protected]] => [name: SOUMITRA, email:[email protected]]
Converting [name: liton, email:[email protected]] => [name: LITON, email:[email protected]]
Converting [name: john, email:[email protected]] => [name: JOHN, email:[email protected]]
Converting [name: sumit, email:[email protected]] => [name: SUMIT, email:[email protected]]
Converting [name: souvik, email:[email protected]] => [name: SOUVIK, email:[email protected]]
Converting [name: debabrata, email:[email protected]] => [name: DEBABRATA, email:[email protected]]
Converting [name: debina, email:[email protected]] => [name: DEBINA, email:[email protected]]
Converting [name: sushil, email:[email protected]] => [name: SUSHIL, email:[email protected]]
Converting [name: francois, email:[email protected]] => [name: FRANCOIS, email:[email protected]]
Converting [name: kanimozi, email:[email protected]] => [name: KANIMOZI, email:[email protected]]
Converting [name: subodh, email:[email protected]] => [name: SUBODH, email:[email protected]]

Source Code

You can download source code.

That’s all. Thanks for your reading.

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *