Most of the cloud services provide API to fetch their data. But data will be given as paginated results as returning the complete data will overshoot the response payload. To discover the complete list of books or e-courses or cloud machine details, we need to call the API page-wise till the end. In this scenario, we can use Spring Batch to get the data page by page and dump it into a file.

Data dumping through REST API using Spring Batch

Most of the cloud services provide API to fetch their data. But data will be given as paginated results as returning the complete data will overshoot the response payload. &nbsp;To discover the complete list of books or e-courses or cloud machine details, we need to call the API page-wise till the end. In this scenario, we can use Spring Batch to get the data page by page and dump it into a file.&nbsp;
In this blog, we will use one of the free-to-use API from <a href="https://build.coursera.org/app-platform/catalog/" target="_blank" rel="noopener">Coursera</a>, to take the dump of e-courses. <a href="https://www.coursera.org/" target="_blank" rel="noopener">Coursera</a> is one of the popular MOOCs site which exposes its e-courses through the REST API. To have a basic introduction about Spring Batch and getting started docs, please refer to the <a href="https://www.findbestopensource.com/article-detail/getting-started-with-spring-batch" target="_blank" rel="noopener">previous blog</a>.
In <a href="https://spring.io/projects/spring-batch" target="_blank" rel="noopener">Spring Batch</a>, tasklet We can use tasklet which will give free-handed to kick start the task and repeat it as per our designed logic. Tasklet will be a single task executed inside a step. The traditional step will have a reader, processor and writer, which works well for file transformation or loading. Fitting our paginated get and dump scenario will be a bit cumbersome. Tasklet gives us the free-hand of placing the GET API request inside the execute and repeat logic till we reach the end of data.&nbsp;
<pre class="language-java"><code>public class CourseGetTasklet implements Tasklet, StepExecutionListener {
 public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) 
	throws Exception {
 //task logic happens here..
 //if RepeatStatus.CONTINUABLE given, this will execute the tasklet again.. 
 }

 public void beforeStep(StepExecution stepExecution) {
 //before starting the tasklet, it will get executed.. 
 }

 public ExitStatus afterStep(StepExecution stepExecution) {
 //after completion of the tasklet, it will get executed.. 
 }
}</code></pre>
Let's set up the spring batch application, through the annotation itself. Create a class and provide for SpringApplication run method. It will have the @EnableBatchProcessing method which enables Spring Batch features and provide a base configuration for setting up batch jobs in an @Configuration class, roughly equivalent to using the &lt;batch:*&gt; XML namespace. @Configuration will mark this class as Spring Configuration class. @EnableAutoConfiguration will scan and adds the other class beans available in the classpath.&nbsp;
JobBuilderFactory is used to create the job with the job id having the RunIdIncrementer. StepBuilderFactory is for creating the steps which kick start the tasklet (CourseGetTasklet) option to build it. &nbsp;
SampleBatchApplication.java
<pre class="language-java"><code>@Configuration
@EnableAutoConfiguration
@EnableBatchProcessing
public class SampleBatchApplication {

 @Autowired
 private JobBuilderFactory jobs;
 @Autowired
 private StepBuilderFactory steps;
 @Bean
 public Job job() throws Exception {
 return this.jobs.get("job").incrementer(new RunIdIncrementer())
 .listener(new JobExecutionListener()).start(step1()).build();
 }
 @Bean
 protected Step step1() throws Exception {
 String epochStr = String.valueOf(new Date().getTime());
 return this.steps.get("step1v" + epochStr)
 .tasklet(new CourseGetTasklet()).throttleLimit(1).build();
 }
 public static void main(String[] args) throws Exception {
 // System.exit is common for Batch applications since the exit code can be used to
 // drive a workflow
 System.exit(SpringApplication.exit(SpringApplication.run(
 SampleBatchApplication.class, args)));
 }
}</code></pre>
Provide the datasource properties in the application.properties which will auto configure the data source for the job repository. spring.batch.jdbc.initialize-schema property will initialize the database with the tables required for jobs and steps.&nbsp;
application.properties:
<pre class="language-javascript"><code>## Spring DATASOURCE (DataSourceAutoConfiguration &amp; DataSourceProperties)
spring.datasource.url = jdbc:mysql://localhost:3306/courseradump?createDatabaseIfNotExist=true&amp;allowPublicKeyRetrieval=true&amp;useSSL=false
spring.datasource.username = root
spring.datasource.password = ****
spring.datasource.platform=mysql
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.initialization-mode=always
spring.batch.jdbc.initialize-schema=always
spring.jpa.hibernate.ddl-auto=create</code></pre>
 Coursera API documentation can be found in this <a href="https://build.coursera.org/app-platform/catalog/" target="_blank" rel="noopener">location</a>. List of courses GET API will help us to get all the courses. We will get the courses with a page size of 100 using CloseableHttpClient. Before starting, we get the step context to get the current offset. After getting the response, we will put the offset back to the step context. No of times is also kept in step context and each time it will be incremented and updated.
Get the response entity in string format and use Jackson deserialize to the CourseResponse model which has the course elements and paging parameters. Filewriter is opened with a try with resources block, so once we get the response, using jackson deserialize the course elements and then write using fileWriter.write method. &nbsp;
Now check the paging next value is greater than the total value then or the next value is null then it says reached the end of page, so we will return repeat status as finished. In other scenarios, update the page offset and no of times in step context and return the Repeat status continuable.&nbsp;
&nbsp;
<pre class="language-java"><code>@Override
 public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) 
 throws Exception {

 int noOfTimes =
 stepContribution.getStepExecution().getExecutionContext().getInt(
 "noOfTimes", 0);
 int offset = stepContribution.getStepExecution().getExecutionContext().getInt(
 "offset", 0);

 StringBuilder courseraUrl =
 new StringBuilder("https://api.coursera.org/api/courses.v1? start=")
 .append(String.valueOf(offset)).append("&amp;limit=")
 .append(String.valueOf(PAGE_LIMIT));

 CourseResponse courseResponse = null;
 CloseableHttpClient httpClient = HttpClients.createDefault();
 logger.info("Get the courseurl {}", courseraUrl.toString());

 try(FileWriter fileWriter = new FileWriter("output.json")) {

 HttpGet request = new HttpGet(courseraUrl.toString());

 // add request headers
 request.addHeader("Accept", "application/json");

 CloseableHttpResponse response = httpClient.execute(request);
 HttpEntity entity = response.getEntity();
 if (entity != null) {
 // return it as a String
 String result = EntityUtils.toString(entity);
 courseResponse = objectMapper.readValue(result, CourseResponse.class);
 fileWriter.write(objectMapper
 .writeValueAsString(courseResponse.getElements()));
 logger.info("elements {} paging {}", 
 courseResponse.getElements(), 
 courseResponse.getPaging());
 }

 } catch (IOException exception) {
 logger.error("io exception happened {}", exception);
 } catch (Exception exception) {
 logger.error("general exception happened {}", exception);
 }
 CourseResponse.PageModel paging = courseResponse.getPaging();
 noOfTimes++;

 if (paging.isNextNull() || paging.getNextValue() &gt; paging.getTotalValue())
 return RepeatStatus.FINISHED;
 else {
 stepContribution.getStepExecution()
					 .getExecutionContext()
						.putInt("noOfTimes", noOfTimes);
 
		 stepContribution.getStepExecution()
 .getExecutionContext()
			 		 .putInt("offset", paging.getNextValue());
										 
 return RepeatStatus.CONTINUABLE;
 }
 }</code></pre>
Course Response model aligns to the course get API response which in turn has the elements and paging. Elements are defined as Course Meta Data model.&nbsp;
CourseResponse.java
<pre class="language-java"><code>package com.springbatch.tutorials.batch.model;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonInclude;

import java.util.List;


@JsonInclude(JsonInclude.Include.NON_NULL)
@JsonIgnoreProperties(ignoreUnknown = true)
public class CourseResponse {

 private List&lt;CourseMetaData&gt; elements;

 private PageModel paging;


 public List&lt;CourseMetaData&gt; getElements() {
 return elements;
 }

 public void setElements(List&lt;CourseMetaData&gt; elements) {
 this.elements = elements;
 }

 public PageModel getPaging() {
 return paging;
 }

 public void setPaging(PageModel paging) {
 this.paging = paging;
 }

 public static class PageModel {
 private String next;
 private String total;

 public String getNext() {
 return next;
 }

 public Integer getNextValue() {
 return Integer.parseInt(next);
 }

 public void setNext(String next) {
 this.next = next;
 }

 public String getTotal() {
 return total;
 }

 public Integer getTotalValue() {
 return Integer.parseInt(total);
 }

 public boolean isNextNull() {
 return this.next == null;
 }

 public void setTotal(String total) {
 this.total = total;
 }

 @Override
 public String toString() {
 return "PageModel{" +
 "next='" + next + '\'' +
 ", total='" + total + '\'' +
 '}';
 }
 }

}</code></pre>
&nbsp;
CourseMetaData.java&nbsp;
<pre class="language-java"><code>package com.springbatch.tutorials.batch.model;

public class CourseMetaData {
 private String courseType;
 private String id;
 private String slug;
 private String name;

 public String getCourseType() {
 return courseType;
 }

 public void setCourseType(String courseType) {
 this.courseType = courseType;
 }

 public String getId() {
 return id;
 }

 public void setId(String id) {
 this.id = id;
 }

 public String getSlug() {
 return slug;
 }

 public void setSlug(String slug) {
 this.slug = slug;
 }

 public String getName() {
 return name;
 }

 public void setName(String name) {
 this.name = name;
 }

 @Override
 public String toString() {
 return "CourseMetaData{" +
 "courseType='" + courseType + '\'' +
 ", id='" + id + '\'' +
 ", slug='" + slug + '\'' +
 ", name='" + name + '\'' +
 '}';
 }
}</code></pre>
In the after step, we can access the step context. Step context provides type-based getters by which we can get the times and offset and write in the log files.&nbsp;
<pre class="language-java"><code>@Override
 public ExitStatus afterStep(StepExecution stepExecution) {
 try {
 int noOfTimes =
 stepExecution.getExecutionContext().getInt(
 "noOfTimes", 0);
 int offset = stepExecution.getExecutionContext().getInt(
 "offset", 0);

 logger.info("After step execution completed {} 
			 after running {} times and last offset {}",
							stepExecution.getStartTime(),
							noOfTimes, offset);
							
 } catch(Exception exception) {
 logger.error("exception ");
 }
 return null;
 }</code></pre>
&nbsp;&nbsp;
Screenshots, while the batch jobs run and finally output file, are given below.
<img src="https://assets.blackslate.io/posts/01/N185ShYlQno7y8H.png" alt="" width="700" height="394">
<img src="https://assets.blackslate.io/posts/01/GHuPjIdxUp8zGyF.png" alt="" width="700" height="394">
<img src="https://assets.blackslate.io/posts/01/r73I7BEy9tx49u5.png" alt="" width="700" height="394">
 Complete source code is available in the <a href="https://github.com/devgroves/courserabatchdump" target="_blank" rel="noopener">GitHub repo</a>, it includes the output JSON file.

Newsletter

Related Articles

Suggested keywords:

Newsletter

Related Articles