Read and Write PDF using OpenPDF

  •        0
  

We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.



OpenPDF is based on a fork of iText version 4. iText is a widely used PDF library but they changed their license and moved to AGPL. In this article, we can see how to read and write to PDF, How to extract text from PDF and How to create password protected PDF.

Maven Dependency

<dependency>
   <groupId>com.github.librepdf</groupId>
   <artifactId>openpdf</artifactId>
  <version>1.2.7</version>
</dependency>

<!-- Bouncy castle is required, if you want to encrypt or password protect PDF -->

<dependency>
  <groupId>org.bouncycastle</groupId>
  <artifactId>bcprov-jdk15on</artifactId>
  <version>1.60</version>
</dependency>

 

Below sample will create a two page PDF. Create a Document instance and pass it to PDF writer. Add text using Paragraphs to the document instance.

public void createPDF(String filename) {
try {

Document document = new Document(PageSize.A4, 50, 50, 50, 50);

//create a PDF writer instance and pass output stream
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));

document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(new Paragraph("Hello World -- Page 1"));
document.add(new Paragraph("This is my first PDF."));

document.newPage();

document.add(new Paragraph("Hello World -- Page 2"));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Lets create a PDF with some images.

public void createImagePDF(String inImageFilename, String outFilename) {
try {
Document document = new Document(PageSize.A4, 50, 50, 50, 50);

PdfWriter.getInstance(document, new FileOutputStream(outFilename));
document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(Image.getInstance(inImageFilename));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Below sample helps to create password protected PDF. setEncryption function of PDFWriter instance takes owner password, user password and document permissions as arguments. By using user password, document can be opened but only copy and print can be performed. The access is restricted. If the document is opened using owner password, the document will have full access.

public void createPasswordProtectedPDF(String outFilename, String password) {
try {
Document document = new Document(PageSize.A4, 50, 50, 50, 50);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outFilename));

//Set owner password and user password.
writer.setEncryption("Hello".getBytes(),
"World".getBytes(),
PdfWriter.ALLOW_COPY | PdfWriter.ALLOW_PRINTING, PdfWriter.STANDARD_ENCRYPTION_128);

document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(new Paragraph("Hello World -- Page 1"));
document.add(new Paragraph("This is my first PDF."));

document.newPage();

document.add(new Paragraph("Hello World -- Page 2"));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Let's try to read the PDF which we created. Below sample will output the complete document structure of PDF. If you want to know, how PDF is structured, below code can help.

public void readPDF(String filename) {
try {
PdfReader reader = new PdfReader(filename);
System.out.println("Document Metadata");
System.out.println(reader.getInfo().toString());
System.out.println("--");

int numPages = reader.getNumberOfPages();
for (int index =1; index <= numPages; index++) {
byte[] pageBuf = reader.getPageContent(index);
String pageContent = new String(pageBuf);
System.out.println("Page - " + index);
System.out.println(pageContent);
System.out.println("");
}
reader.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

 

Now you have the PDF with some text and you want to extract its text. This would be typical use case where we want to search the PDF content.

public void extractPDF(String filename) {
try {
PdfReader reader = new PdfReader(filename);
System.out.println("Document Metadata");
System.out.println(reader.getInfo().toString());
System.out.println("--");

//Text Extraction
int numPages = reader.getNumberOfPages();
PdfTextExtractor textExtractor = new PdfTextExtractor(reader);
for (int index =1; index <= numPages; index++) {
String pageContent = textExtractor.getTextFromPage(index);
System.out.println("Page - " + index);
System.out.println(pageContent);
}
reader.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

OpenPDF is a nice library and it has lot more features. If you have requirement to generate PDF in the backend then this library can be considered.

Reference:

Github Location - https://github.com/LibrePDF/OpenPDF

Javadocs - https://librepdf.github.io/OpenPDF/docs-1-2-7/

 

 


   

We publish blog post about open source products. If you are interested in sharing knowledge about open source products, please visit write for us

Subscribe to our newsletter.

We will send mail once in a week about latest updates on open source tools and technologies. subscribe our newsletter



Related Articles

Generate PDF from Javascript using jsPDF

  • pdf jspdf javascript

We show lot of data in our web applications, it will be awesome if we quickly download specific part of PDF rather than printing it. It will be easy to share for different stakeholders and also for focused meetings. In web application development, download to PDF means, we need to develop backend api specifically and then link it in frontend which takes longer development cylce. Instead it would be really great, if there is way to download what we see in the user interface quickly with few lines of Javascript, similar to export options in word processing application.

Read More


LogicalDOC - Open Source DMS

  • dms document-management-system

LogicalDOC is both a document management and a collaboration system. The software is loaded with many functions and allows organizing, indexing, retrieving, controlling and distributing important business documents securely and safely for any organization and individual.

Read More


8 Best Open Source Searchengines built on top of Lucene

  • lucene solr searchengine elasticsearch

Lucene is most powerful and widely used Search engine. Here is the list of 7 search engines which is built on top of Lucene. You could imagine how powerful they are.

Read More


WebSocket implementation with Spring Boot

  • websocket web-sockets spring-boot java

Spring Boot is a microservice-based Java framework used to create web application. WebSocket API is an advanced technology that provides full-duplex communication channels over a single TCP connection. This article explains about how to implement WebSocket using Spring Boot.

Read More


Caching using Ehcache Java Library

  • ehcache cache java map key-value

Ehcache from Terracotta is one of Java's most widely used Cache. It is concurrent and highly scalable. It has small footprint with SL4J as the only dependencies. It supports multiple strategies like Expiration policies, Eviction policies. It supports three storage tiers, heap, off-heap, disk storage. There are very few caching products supports multiple tier storage. If you want to scale, you cannot store all items in heap there should be support for off-heap and disk storage. Ehcache is licensed under Apache 2.0. In this article, we can see about basic usage of Ehcache.

Read More



Quick Start Programming Guide for redis using java client Jedis

  • redis jedis redis-client programming database java

Redis is an open source (BSD licensed), in-memory data structure store, used also as a database cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This article explains about how to communicate with Redis using Java client Jedis.

Read More


Lucene Vs Solr

  • searchengine lucene solr

Lucene is a search library built in Java. Solr is a web application built on top of Lucene. Certainly Solr = Lucene + Added features. Often there would a question, when to choose Solr and when to choose Lucene.

Read More


Data dumping through REST API using Spring Batch

  • spring-batch data-dump rest-api java

Most of the cloud services provide API to fetch their data. But data will be given as paginated results as returning the complete data will overshoot the response payload. To discover the complete list of books or e-courses or cloud machine details, we need to call the API page-wise till the end. In this scenario, we can use Spring Batch to get the data page by page and dump it into a file.

Read More


Thymeleaf - Text display, Iteration and Conditionals

  • thymeleaf template-engine web-programming java

Thymeleaf is a server-side Java template engine for both web and standalone environments. It is a better alternative to JavaServer Pages (JSP). Spring MVC and Thymeleaf compliment each other if chosen for web application development. In this article, we will discuss how to use Thymeleaf.

Read More


Connect to MongoDB and Perform CRUD using Java

  • java mongodb database programming

MongoDB is a popular and widely used open source NoSQL database. MongoDB is a distributed database at its core, so high availability, horizontal scaling, and geographic distribution is quite possible. It is licensed under Server Side Public License. Recently they moved to Server Side Public License, before that MongoDB was released under AGPL. This article will provide basic example to connect and work with MongoDB using Java.

Read More


Introduction to Light 4J Microservices Framework

  • light4j microservice java programming framework

Light 4j is fast, lightweight, secure and cloud native microservices platform written in Java 8. It is based on pure HTTP server without Java EE platform. It is hosted by server UnderTow. Light-4j and related frameworks are released under the Apache 2.0 license.

Read More


Apache OpenNLP - Document Classification

  • opennlp natural-language-processing nlp document-classification

Apache OpenNLP is a library for natural language processing using machine learning. In this article, we will explore document/text classification by training with sample data and then execute to get its results. We will use plain training model as one example and then training using Navie Bayes Algorithm.

Read More


ETag Easy With RESTEasy

  • resteasy etag http-header rest-api

RESTEasy is a JBoss project that provides various frameworks to help you build RESTful Web Services and RESTful Java applications. It comprises of frameworks for mock, embeddable server, rest client, proxy servers, logging and so on.In this article, we will walk-through ETag implementation and show the behaviour related to ETag done by rest easy framework. Example is developed using RESTEasy 3.7 and deployed in tomcat as RESTEasy framework is portable.

Read More


Getting Started on Undertow Server

  • java web-server undertow rest

Undertow is a high performing web server which can be used for both blocking and non-blocking tasks. It is extermely flexible as application can assemble the parts in whatever way it would make sense. It also supports Servlet 4.0, JSR-356 compliant web socket implementation. Undertow is licensed under Apache License, Version 2.0.

Read More


RESTEasy - A guide to implement CRUD Rest API

  • resteasy rest-api java framework

RESTEasy is a JBoss project that provides various frameworks to help you build RESTful Web Services and RESTful Java applications. It is a fully certified and portable implementation of the JAX-RS 2.1 specification, a JCP specification that provides a Java API for RESTful Web Services over the HTTP protocol. It is licensed under the Apache 2.0 license.

Read More


Getting Started with Spring Batch

  • spring-batch spring-boot batch-processing

The best way to design a system for handling bulk workloads is to make it a batch system. If we are already using Spring, it will be easy to add a Spring Batch to the project. Spring batch provides a lot of boiler plate features required for batch processing like chunk based processing, transaction management and declarative input/output operations. It also provides job control for start, stop, restart, retry and skip processing also.

Read More


Light4j Cookbook - Rest API, CORS and RDBMS

  • light4j sql cors rest-api

Light 4j is a fast, lightweight and cloud-native microservices framework. In this article, we will see what and how hybrid framework works and integrate with RDMS databases like MySQL, also built in option of CORS handler for in-flight request.

Read More


Build Simple Ecommerce site using React and Jotai

  • ecommerce react jotai

Retail Ecommerce website can be created quickly with React. It can be created using React, Bootstrap, React DOM and Jotai. Data flow within the commerce site is done using Jotai in a light-weight manner. The main workflow of showing all the items in the gallery to user and user adding to the cart will be done as part of this blog.

Read More


Desktop Apps using Electron JS with centralized data control

  • electronjs couchdb pouchdb desktop-app

When there is a requirement for having local storage for the desktop application context and data needs to be synchronized to central database, we can think of Electron with PouchDB having CouchDB stack. Electron can be used for cross-platform desktop apps with pouch db as local storage. It can sync those data to centralized database CouchDB seamlessly so any point desktop apps can recover or persist the data. In this article, we will go through of creation of desktop apps with ElectronJS, PouchDB and show the sync happens seamlessly with remote CouchDB.

Read More


Push Notifications using Angular

  • angular push-notifications notifications

Notifications is a message pushed to user's device passively. Browser supports notifications and push API that allows to send message asynchronously to the user. Messages are sent with the help of service workers, it runs as background tasks to receive and relay the messages to the desktop if the application is not opened. It uses web push protocol to register the server and send message to the application. Once user opt-in for the updates, it is effective way of re-engaging users with customized content.

Read More







We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.