Getting Started with Spring Batch

  •        0
  

We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.



The best way to design a system for handling bulk workloads is to make it a batch system. If we are already using Spring, it will be easy to add a Spring Batch to the project. Spring Batch provides a lot of boiler plate features required for batch processing like chunk based processing, transaction management and declarative input/output operations. It also provides job control for start, stop, restart, retry and skip processing also.

Spring has recently released 4.3.x and it is licensed under Apache-2.0 License. It is supported under the OSS support policy. Also it provides commercial support till end of the life product. There are many samples available in the spring batch samples github project.

It is a lightweight, comprehensive solution designed to enable the development of robust batch applications, which are often found in modern enterprise systems. Spring Batch builds upon the POJO-based development approach of the Spring Framework.

Batch Job Basics:

Spring batch has the traditional batch architecture. Each process workflow is termed as a batch job which internally consists of multiple steps. Each step can consist of any or all of the sequence steps of reading, processing and writing the data. Spring framework itself provides many heavy-lifting tasks of starting the project, executing jobs which in turn execute the steps inside each job, also readers and writers for database, jms, jdbc, files and so on.

When we code for a batch, it needs to be little different from the traditional way of development done for web development. In web development we generally read one or few records based on the web request. But in the batch, we need to typically all reads to be done in bulk way and do bulk process and then write in bulk, I/O readers and writers will give a benefit for reading and writing in bulk. Typically for a large volume of writes, if we do one by one it will take time, but if we do it in bulk, writes will be faster.

As part of this blog, we will create a simple tasklet to do some ad-hoc task continuously in a batch job process for a period of time. This will help us to understand how to do a work in a non conventional way without reader and writers. The job and step execution stats will be persisted in the configured data source. Here we are going to persist in the h2 embedded file. Data source configuration can be configured through the application.properties with driver, dialect and datasource location.

spring.datasource.URL=jdbc:h2:file:/opt/projects/springbatchexamples/batch db
spring.batch.jdbc.initialize-schema=always
spring.datasource.username=sa
spring.datasource.password=
spring.jpa.database-platform=org.hibernate.dialect.H2Dialect

@EnableBatchProcessing Annotation:

In the configuration class, we need to introduce the @EnableBatchProcessing annotation. This is the batch configuration class to bootstrap JobBuilderFactory, StepBuilderFactory, JobRepository, TransactionManager, DatasourceTransactionManager and other batch related components. It will save a lot of configuration work by giving pre-built beans. By default, it executes all Jobs in the application context on startup. You can narrow down to a specific job or jobs by specifying spring.batch.job.names (which takes a comma-separated list of job name patterns).

@EnableBatchProcessing
public class BatchConfiguration extends DefaultBatchConfigurer {

JobBuilderFactory

JobBuilderFactory provides the strategy to create a job with a job name and having the incrementer for having unique job parameters value available for each job instance. It needs to include the start method to mention the step to be started and then build the job bean, so while bootstrapping the batch application, the job will be picked up for the execution.

    @Bean
    public Job importUserJob(JobCompletionNotificationListener listener, Step step1) {
        return jobBuilderFactory.get("awsbatch")
					.incrementer(new RunIdIncrementer())
					.listener(listener)
					.flow(step1)
					.end()
                    .build();
    }

StepBuilderFactory

StepBuilderFactory provides the strategy to create the step with step name. We will add the time also to make the step name unique for each run. It will have the throttlelimit one as there will be one tasklet. To do some ad-hoc task without any reader or writer, it can be done by Tasklet.

       @Bean
	protected Step step1() throws Exception {
		String epochStr = String.valueOf(new Date().getTime());
		return this.steps.get("step1v" + epochStr)
						 .tasklet(tasklet())
						 .throttleLimit(1)
						 .build();
	}

Listeners

Listeners are available at job and step level. It has both before and after job or step level execution. In our example, we will put the starttime in the before job execution as we are going to execute the tasklet for a period of time. It will be a time bounded job execution. StartTime value is kept inside the jobExecution context. It allows us to put key value pairs in the job execution context.

public class JobExecutionListener extends JobExecutionListenerSupport {

	    public void afterJob(JobExecution jobExecution) {
			System.out.println("after Job Time" + new Date().toString());
	    }

	    public void beforeJob(JobExecution jobExecution) {
			System.out.println("before job time " + new Date().toString());
			jobExecution.getExecutionContext().put("startTime", new Date());
	    }

	}

Tasklet Configuration

Tasklet will be a separate instance created for tasklet bean where the execute method is the task to be run will be kept. It is similar to the run method in the runnable and this execute method has ChunkContext and StepContribution. 

ChunkContext - Each chunk execution run context will be stored and have provision to store user defined parameters.
StepContribution - Tasklet triggered step parameters with job parameters are available during the tasklet execution. It has the stats of the jobs like read, write, skip, filters etc.,

In the chunk context, times value is stored and each time it will be incremented. If the step execution gets repeated with the repeatStatus as continuable, the tasklet will be executed again. We will be returning continuously until the elapsed time is 5 seconds. To find the elapsed time, job start time is present in the job execution context and compare with the current time.

 

@Bean
protected Tasklet tasklet() {
return new Tasklet() {
	@Override
	public RepeatStatus execute(StepContribution contribution,
			ChunkContext context) {
		
		Date startTime = (Date)context.getStepContext()
							  .getJobExecutionContext()
							  .get("startTime");
		
		Date currentTime = new Date();

		Integer value = (Integer) context.getAttribute("times");

		if ( value == null) {
			value = 0;
		}
		
		context.setAttribute("times", ++value);
		System.out.println("This is tasklet execution for" + value + " times");

		System.out.println(context.toString());
		System.out.println(contribution.toString());
		
		if (currentTime.getTime() - startTime.getTime() > 5000)
			return RepeatStatus.FINISHED;
		else
			return RepeatStatus.CONTINUABLE;
	}
};
}

Spring batch job can be generated from maven archtetype by just running the below command. It will run in an interactive mode, choose the Spring batch archetype number(2803) and then provide the group artifact id to generate the batch job with the project.

mvn archetype:generate

Then update to the recent spring boot parent version to get the latest of the spring batch version.

<parent>
        <!-- Your own application should inherit from spring-boot-starter-parent -->
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.6.3</version>
</parent>

There will be few changes in spring boot test package and h2 dependency runtime so it will be used only at the execution time. Likewise the import package has to be changed for outputcapture class. For more information, please refer to the github branch.

		<dependency>
			<groupId>com.h2database</groupId>
			<artifactId>h2</artifactId>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-test</artifactId>
		</dependency>
		<!-- https://mvnrepository.com/artifact/junit/junit -->
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.13.2</version>
			<scope>test</scope>
		</dependency>

Now if we run the project with "sudo mvn clean install", it will run the spring batch test job by which we can confirm whether the batch job runs for a period of time and gets stopped. 

sudo java -jar target/springbatchexamples-1.0-SNAPSHOT.jar

To know the job and step repository run the h2 jar, it will be under the .m2 repository location.Trigger the java jar file. 

~/.m2/repository/com/h2database/h2/1.4.200$ java -jar h2-1.4.200.jar

It opens up the browser with the h2 login, where we mention the location of the embedded h2 file to open up.

Then it opens up the list of the tables where we trigger the sql to see the job details. It will be shown as below:

 


   

Nagappan is a techie-geek and a full-stack senior developer having 10+ years of experience in both front-end and back-end. He has experience on front-end web technologies like HTML, CSS, JAVASCRIPT, Angular and expert in Java and related frameworks like Spring, Struts, EJB and RESTEasy framework. He hold bachelors degree in computer science and he is very passionate in learning new technologies.

Subscribe to our newsletter.

We will send mail once in a week about latest updates on open source tools and technologies. subscribe our newsletter



Related Articles

Data dumping through REST API using Spring Batch

  • spring-batch data-dump rest-api java

Most of the cloud services provide API to fetch their data. But data will be given as paginated results as returning the complete data will overshoot the response payload. To discover the complete list of books or e-courses or cloud machine details, we need to call the API page-wise till the end. In this scenario, we can use Spring Batch to get the data page by page and dump it into a file.

Read More


WebSocket implementation with Spring Boot

  • websocket web-sockets spring-boot java

Spring Boot is a microservice-based Java framework used to create web application. WebSocket API is an advanced technology that provides full-duplex communication channels over a single TCP connection. This article explains about how to implement WebSocket using Spring Boot.

Read More


Thymeleaf - Text display, Iteration and Conditionals

  • thymeleaf template-engine web-programming java

Thymeleaf is a server-side Java template engine for both web and standalone environments. It is a better alternative to JavaServer Pages (JSP). Spring MVC and Thymeleaf compliment each other if chosen for web application development. In this article, we will discuss how to use Thymeleaf.

Read More


JHipster - Generate simple web application code using Spring Boot and Angular

  • jhipster spring-boot angular web-application

JHipster is one of the full-stack web app development platform to generate, develop and deploy. It provides the front end technologies options of React, Angular, Vue mixed with bootstrap and font awesome icons. Last released version is JHipster 6.0.1. It is licensed under Apache 2 license.

Read More


Introduction to Light 4J Microservices Framework

  • light4j microservice java programming framework

Light 4j is fast, lightweight, secure and cloud native microservices platform written in Java 8. It is based on pure HTTP server without Java EE platform. It is hosted by server UnderTow. Light-4j and related frameworks are released under the Apache 2.0 license.

Read More



Advanced Programming Guide in Redis using Jedis

  • redis jedis advanced-guide cluster pipline publish-subscribe

Redis is an in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This blog covers the advanced concepts like cluster, publish and subscribe, pipeling concepts of Redis using Jedis Java library.

Read More


Activiti - Open Source Business Automation

  • business-automation business bpm

Activiti Cloud is the first Cloud Native BPM framework built to provide a scalable and transparent solution for BPM implementations in cloud environments. The BPM discipline was created to provide a better understanding of how organisations do their work and how this work can be improved in an iterative fashion.

Read More


PySpark: Installation, RDDs, MLib, Broadcast and Accumulator

  • pyspark spark python rdd big-data

We knew that Apace Spark- the most famous parallel computing model or processing the massive data set is written in Scala programming language. The Apace foundation offered a tool to support the Python in Spark which was named PySpark. The PySpark allows us to use RDDs in Python programming language through a library called Py4j. This article provides basic introduction about PySpark, RDD, MLib, Broadcase and Accumulator.

Read More


Apache OpenNLP - Document Classification

  • opennlp natural-language-processing nlp document-classification

Apache OpenNLP is a library for natural language processing using machine learning. In this article, we will explore document/text classification by training with sample data and then execute to get its results. We will use plain training model as one example and then training using Navie Bayes Algorithm.

Read More


Getting started with Apache OpenNLP

  • apache-opennlp machine-learning java nlp natural-language-processing

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. OpenNLP also includes entropy and perceptron based machine learning. . It contains several components for natural language processing pipeline like sentence detector, tokenizer, name finder, document categorizer, part-of-speech tagger, chunker, parser, co-reference resolution.

Read More


Best open source Text Editors

  • text-editor editor tools dev-tools

Text editors are mainly used by programmers and developers for manipulating plain text source code, editing configuration files or preparing documentation and even viewing error logs. Text editors is a piece of software which enables to create, modify and delete files that a programmer is using while creating website or mobile app.In this article, we will discuss about top 7 all-round performing text editors which is highly supportive for programmers.

Read More


Struts 1.x End Of Life. Whats alternative?

  • java eol struts

The Apache Struts Project Team announced End of Life (EOL) for Struts 1.x web framework. Struts was launched in the year 2000. It is only of the widely used web framework. It gave better control over writing UI and business logic code directly in to JSPs.

Read More


Push Notifications using Angular

  • angular push-notifications notifications

Notifications is a message pushed to user's device passively. Browser supports notifications and push API that allows to send message asynchronously to the user. Messages are sent with the help of service workers, it runs as background tasks to receive and relay the messages to the desktop if the application is not opened. It uses web push protocol to register the server and send message to the application. Once user opt-in for the updates, it is effective way of re-engaging users with customized content.

Read More


8 Best Open Source Searchengines built on top of Lucene

  • lucene solr searchengine elasticsearch

Lucene is most powerful and widely used Search engine. Here is the list of 7 search engines which is built on top of Lucene. You could imagine how powerful they are.

Read More


Angular Security - Authentication Service

  • angular security authentication jwt

Angular is a framework for creating single page web application. Angular facilitates the security feature and protection mechanism. It provides frameworks by verifying all the routing urls with security authguard interface to validate and verify the user and its permissions.

Read More


COVID19 Stats using Angular Material Design

  • angular material-design covid covid-stats

Material design is inspired from the real world building architecture language. It is an adaptable system of guidelines, components, and tools that support the best practices of user interface design. Backed by open-source code, Material streamlines collaboration between designers and developers, and helps teams quickly build beautiful products. In this article, we will build COVID stats using Angular Material design.

Read More


All About Multi-Provider Feature of Angular Version 2.0

  • angular dependency-injection multi-providers

The newly introduced concept of dependency injection in Angular version 2.0 makes it an attractive front-end technology all because of one amazing feature called 'Multi-Providers'. In general, it allows the users to attach certain operations by themselves and a few plugin custom functionality which is not required in our mobile app use case.

Read More


Getting Started on Undertow Server

  • java web-server undertow rest

Undertow is a high performing web server which can be used for both blocking and non-blocking tasks. It is extermely flexible as application can assemble the parts in whatever way it would make sense. It also supports Servlet 4.0, JSR-356 compliant web socket implementation. Undertow is licensed under Apache License, Version 2.0.

Read More


Getting Started on Angular 7

  • angular ui-ux front-end-framework

Angular is a platform for building responsive web, native desktop and native mobile applications. Angular client applications are built using HTML, CSS and Typescript. Typescript is a typed superset of Javascript that compiles to plain Javascript. Angular core and optional modules are built using Typescript. Code has been licensed as MIT License.

Read More


Exonum Blockchain Framework by the Bitfury Group

  • blockchain bitcoin hyperledger blockchain-framework

Exonum is an extensible open source blockchain framework for building private blockchains which offers outstanding performance, data security, as well as fault tolerance. The framework does not include any business logic, instead, you can develop and add the services that meet your specific needs. Exonum can be used to build various solutions from a document registry to a DevOps facilitation system.

Read More







We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.