We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.
Most of the cloud services provide API to fetch their data. But data will be given as paginated results as returning the complete data will overshoot the response payload. To discover the complete list of books or e-courses or cloud machine details, we need to call the API page-wise till the end. In this scenario, we can use Spring Batch to get the data page by page and dump it into a file.
In this blog, we will use one of the free-to-use API from Coursera, to take the dump of e-courses. Coursera is one of the popular MOOCs site which exposes its e-courses through the REST API. To have a basic introduction about Spring Batch and getting started docs, please refer to the previous blog.
In Spring Batch, tasklet We can use tasklet which will give free-handed to kick start the task and repeat it as per our designed logic. Tasklet will be a single task executed inside a step. The traditional step will have a reader, processor and writer, which works well for file transformation or loading. Fitting our paginated get and dump scenario will be a bit cumbersome. Tasklet gives us the free-hand of placing the GET API request inside the execute and repeat logic till we reach the end of data.
public class CourseGetTasklet implements Tasklet, StepExecutionListener {
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext)
throws Exception {
//task logic happens here..
//if RepeatStatus.CONTINUABLE given, this will execute the tasklet again..
}
public void beforeStep(StepExecution stepExecution) {
//before starting the tasklet, it will get executed..
}
public ExitStatus afterStep(StepExecution stepExecution) {
//after completion of the tasklet, it will get executed..
}
}
Let's set up the spring batch application, through the annotation itself. Create a class and provide for SpringApplication run method. It will have the @EnableBatchProcessing method which enables Spring Batch features and provide a base configuration for setting up batch jobs in an @Configuration class, roughly equivalent to using the <batch:*> XML namespace. @Configuration will mark this class as Spring Configuration class. @EnableAutoConfiguration will scan and adds the other class beans available in the classpath.
JobBuilderFactory is used to create the job with the job id having the RunIdIncrementer. StepBuilderFactory is for creating the steps which kick start the tasklet (CourseGetTasklet) option to build it.
SampleBatchApplication.java
@Configuration
@EnableAutoConfiguration
@EnableBatchProcessing
public class SampleBatchApplication {
@Autowired
private JobBuilderFactory jobs;
@Autowired
private StepBuilderFactory steps;
@Bean
public Job job() throws Exception {
return this.jobs.get("job").incrementer(new RunIdIncrementer())
.listener(new JobExecutionListener()).start(step1()).build();
}
@Bean
protected Step step1() throws Exception {
String epochStr = String.valueOf(new Date().getTime());
return this.steps.get("step1v" + epochStr)
.tasklet(new CourseGetTasklet()).throttleLimit(1).build();
}
public static void main(String[] args) throws Exception {
// System.exit is common for Batch applications since the exit code can be used to
// drive a workflow
System.exit(SpringApplication.exit(SpringApplication.run(
SampleBatchApplication.class, args)));
}
}
Provide the datasource properties in the application.properties which will auto configure the data source for the job repository. spring.batch.jdbc.initialize-schema property will initialize the database with the tables required for jobs and steps.
application.properties:
## Spring DATASOURCE (DataSourceAutoConfiguration & DataSourceProperties)
spring.datasource.url = jdbc:mysql://localhost:3306/courseradump?createDatabaseIfNotExist=true&allowPublicKeyRetrieval=true&useSSL=false
spring.datasource.username = root
spring.datasource.password = ****
spring.datasource.platform=mysql
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.initialization-mode=always
spring.batch.jdbc.initialize-schema=always
spring.jpa.hibernate.ddl-auto=create
Coursera API documentation can be found in this location. List of courses GET API will help us to get all the courses. We will get the courses with a page size of 100 using CloseableHttpClient. Before starting, we get the step context to get the current offset. After getting the response, we will put the offset back to the step context. No of times is also kept in step context and each time it will be incremented and updated.
Get the response entity in string format and use Jackson deserialize to the CourseResponse model which has the course elements and paging parameters. Filewriter is opened with a try with resources block, so once we get the response, using jackson deserialize the course elements and then write using fileWriter.write method.
Now check the paging next value is greater than the total value then or the next value is null then it says reached the end of page, so we will return repeat status as finished. In other scenarios, update the page offset and no of times in step context and return the Repeat status continuable.
@Override
public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext)
throws Exception {
int noOfTimes =
stepContribution.getStepExecution().getExecutionContext().getInt(
"noOfTimes", 0);
int offset = stepContribution.getStepExecution().getExecutionContext().getInt(
"offset", 0);
StringBuilder courseraUrl =
new StringBuilder("https://api.coursera.org/api/courses.v1? start=")
.append(String.valueOf(offset)).append("&limit=")
.append(String.valueOf(PAGE_LIMIT));
CourseResponse courseResponse = null;
CloseableHttpClient httpClient = HttpClients.createDefault();
logger.info("Get the courseurl {}", courseraUrl.toString());
try(FileWriter fileWriter = new FileWriter("output.json")) {
HttpGet request = new HttpGet(courseraUrl.toString());
// add request headers
request.addHeader("Accept", "application/json");
CloseableHttpResponse response = httpClient.execute(request);
HttpEntity entity = response.getEntity();
if (entity != null) {
// return it as a String
String result = EntityUtils.toString(entity);
courseResponse = objectMapper.readValue(result, CourseResponse.class);
fileWriter.write(objectMapper
.writeValueAsString(courseResponse.getElements()));
logger.info("elements {} paging {}",
courseResponse.getElements(),
courseResponse.getPaging());
}
} catch (IOException exception) {
logger.error("io exception happened {}", exception);
} catch (Exception exception) {
logger.error("general exception happened {}", exception);
}
CourseResponse.PageModel paging = courseResponse.getPaging();
noOfTimes++;
if (paging.isNextNull() || paging.getNextValue() > paging.getTotalValue())
return RepeatStatus.FINISHED;
else {
stepContribution.getStepExecution()
.getExecutionContext()
.putInt("noOfTimes", noOfTimes);
stepContribution.getStepExecution()
.getExecutionContext()
.putInt("offset", paging.getNextValue());
return RepeatStatus.CONTINUABLE;
}
}
Course Response model aligns to the course get API response which in turn has the elements and paging. Elements are defined as Course Meta Data model.
CourseResponse.java
package com.springbatch.tutorials.batch.model;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonInclude;
import java.util.List;
@JsonInclude(JsonInclude.Include.NON_NULL)
@JsonIgnoreProperties(ignoreUnknown = true)
public class CourseResponse {
private List<CourseMetaData> elements;
private PageModel paging;
public List<CourseMetaData> getElements() {
return elements;
}
public void setElements(List<CourseMetaData> elements) {
this.elements = elements;
}
public PageModel getPaging() {
return paging;
}
public void setPaging(PageModel paging) {
this.paging = paging;
}
public static class PageModel {
private String next;
private String total;
public String getNext() {
return next;
}
public Integer getNextValue() {
return Integer.parseInt(next);
}
public void setNext(String next) {
this.next = next;
}
public String getTotal() {
return total;
}
public Integer getTotalValue() {
return Integer.parseInt(total);
}
public boolean isNextNull() {
return this.next == null;
}
public void setTotal(String total) {
this.total = total;
}
@Override
public String toString() {
return "PageModel{" +
"next='" + next + '\'' +
", total='" + total + '\'' +
'}';
}
}
}
CourseMetaData.java
package com.springbatch.tutorials.batch.model;
public class CourseMetaData {
private String courseType;
private String id;
private String slug;
private String name;
public String getCourseType() {
return courseType;
}
public void setCourseType(String courseType) {
this.courseType = courseType;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getSlug() {
return slug;
}
public void setSlug(String slug) {
this.slug = slug;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
@Override
public String toString() {
return "CourseMetaData{" +
"courseType='" + courseType + '\'' +
", id='" + id + '\'' +
", slug='" + slug + '\'' +
", name='" + name + '\'' +
'}';
}
}
In the after step, we can access the step context. Step context provides type-based getters by which we can get the times and offset and write in the log files.
@Override
public ExitStatus afterStep(StepExecution stepExecution) {
try {
int noOfTimes =
stepExecution.getExecutionContext().getInt(
"noOfTimes", 0);
int offset = stepExecution.getExecutionContext().getInt(
"offset", 0);
logger.info("After step execution completed {}
after running {} times and last offset {}",
stepExecution.getStartTime(),
noOfTimes, offset);
} catch(Exception exception) {
logger.error("exception ");
}
return null;
}
Screenshots, while the batch jobs run and finally output file, are given below.
Complete source code is available in the GitHub repo, it includes the output JSON file.
Subscribe to our newsletter.
We will send mail once in a week about latest updates on open source tools and technologies. subscribe our newsletterThe best way to design a system for handling bulk workloads is to make it a batch system. If we are already using Spring, it will be easy to add a Spring Batch to the project. Spring batch provides a lot of boiler plate features required for batch processing like chunk based processing, transaction management and declarative input/output operations. It also provides job control for start, stop, restart, retry and skip processing also.
Redis is an in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This blog covers the advanced concepts like cluster, publish and subscribe, pipeling concepts of Redis using Jedis Java library.
Spring Boot is a microservice-based Java framework used to create web application. WebSocket API is an advanced technology that provides full-duplex communication channels over a single TCP connection. This article explains about how to implement WebSocket using Spring Boot.
Light 4j is fast, lightweight, secure and cloud native microservices platform written in Java 8. It is based on pure HTTP server without Java EE platform. It is hosted by server UnderTow. Light-4j and related frameworks are released under the Apache 2.0 license.
We knew that Apace Spark- the most famous parallel computing model or processing the massive data set is written in Scala programming language. The Apace foundation offered a tool to support the Python in Spark which was named PySpark. The PySpark allows us to use RDDs in Python programming language through a library called Py4j. This article provides basic introduction about PySpark, RDD, MLib, Broadcase and Accumulator.
Light 4j is a fast, lightweight and cloud-native microservices framework. In this article, we will see what and how hybrid framework works and integrate with RDMS databases like MySQL, also built in option of CORS handler for in-flight request.
Undertow is a high performing web server which can be used for both blocking and non-blocking tasks. It is extermely flexible as application can assemble the parts in whatever way it would make sense. It also supports Servlet 4.0, JSR-356 compliant web socket implementation. Undertow is licensed under Apache License, Version 2.0.
You would have seen a lot of blogs for paypal php integration with REST api which is driven completely in the backend. For checkout, paypal provides an easy way to checkout for client side ready-to-use smart button payment. This approach will work only from the frontend, which will not be safe and difficult to reconcile as the backend does not have any information about it. Server side integration with the paypal smart button will help us to reconcile or track the payments even after some issues in the users payment journey. In this blog, we have walkthrough the paypal smart button with server side php laravel integration.
RESTEasy is a JBoss project that provides various frameworks to help you build RESTful Web Services and RESTful Java applications. It is a fully certified and portable implementation of the JAX-RS 2.1 specification, a JCP specification that provides a Java API for RESTful Web Services over the HTTP protocol. It is licensed under the ASL 2.0.
RESTEasy is JAX-RS 2.1 compliant framework for developing rest applications. It is a JBoss project that provides various frameworks to help you build RESTful Web Services and RESTful Java applications. It is a fully certified and portable implementation of the JAX-RS 2.1 specification, a JCP specification that provides a Java API for RESTful Web Services over the HTTP protocol.
Json Web Token shortly called as JWT becomes defacto standard for authenticating REST API. In a traditional web application, once the user login credentials are validated, loggedin user object will be stored in session. Till user logs out, session will remain and user can work on the web application without any issues. Rest world is stateless, it is difficult to identify whether the user is already authenticated. One way is to use authenticate every API but that would be too expensive task as the client has to provide credentials in every API. Another approach is to use token.
Thymeleaf is a server-side Java template engine for both web and standalone environments. It is a better alternative to JavaServer Pages (JSP). Spring MVC and Thymeleaf compliment each other if chosen for web application development. In this article, we will discuss how to use Thymeleaf.
Activiti Cloud is the first Cloud Native BPM framework built to provide a scalable and transparent solution for BPM implementations in cloud environments. The BPM discipline was created to provide a better understanding of how organisations do their work and how this work can be improved in an iterative fashion.
Next.js is one of the easy-to-learn frameworks for server-side pre-render pages for client-side web applications. In this blog, we will see how we can fetch data from API and make it pre-render pages. Also, let's see how forms work in Next.js and collect the data without maintaining the database.
Lucene is most powerful and widely used Search engine. Here is the list of 7 search engines which is built on top of Lucene. You could imagine how powerful they are.
When there is a requirement for having local storage for the desktop application context and data needs to be synchronized to central database, we can think of Electron with PouchDB having CouchDB stack. Electron can be used for cross-platform desktop apps with pouch db as local storage. It can sync those data to centralized database CouchDB seamlessly so any point desktop apps can recover or persist the data. In this article, we will go through of creation of desktop apps with ElectronJS, PouchDB and show the sync happens seamlessly with remote CouchDB.
Notifications is a message pushed to user's device passively. Browser supports notifications and push API that allows to send message asynchronously to the user. Messages are sent with the help of service workers, it runs as background tasks to receive and relay the messages to the desktop if the application is not opened. It uses web push protocol to register the server and send message to the application. Once user opt-in for the updates, it is effective way of re-engaging users with customized content.
Ehcache from Terracotta is one of Java's most widely used Cache. It is concurrent and highly scalable. It has small footprint with SL4J as the only dependencies. It supports multiple strategies like Expiration policies, Eviction policies. It supports three storage tiers, heap, off-heap, disk storage. There are very few caching products supports multiple tier storage. If you want to scale, you cannot store all items in heap there should be support for off-heap and disk storage. Ehcache is licensed under Apache 2.0. In this article, we can see about basic usage of Ehcache.
Redis is an open source (BSD licensed), in-memory data structure store, used also as a database cache and message broker. It is written in ANSI C and works in all the operating systems. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This article explains about how to install Redis.
Material design is inspired from the real world building architecture language. It is an adaptable system of guidelines, components, and tools that support the best practices of user interface design. Backed by open-source code, Material streamlines collaboration between designers and developers, and helps teams quickly build beautiful products. In this article, we will build COVID stats using Angular Material design.
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.