Read and Write PDF using OpenPDF

  •        0
  

We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.



OpenPDF is based on a fork of iText version 4. iText is a widely used PDF library but they changed their license and moved to AGPL. In this article, we can see how to read and write to PDF, How to extract text from PDF and How to create password protected PDF.

Maven Dependency

<dependency>
   <groupId>com.github.librepdf</groupId>
   <artifactId>openpdf</artifactId>
  <version>1.2.7</version>
</dependency>

<!-- Bouncy castle is required, if you want to encrypt or password protect PDF -->

<dependency>
  <groupId>org.bouncycastle</groupId>
  <artifactId>bcprov-jdk15on</artifactId>
  <version>1.60</version>
</dependency>

 

Below sample will create a two page PDF. Create a Document instance and pass it to PDF writer. Add text using Paragraphs to the document instance.

public void createPDF(String filename) {
try {

Document document = new Document(PageSize.A4, 50, 50, 50, 50);

//create a PDF writer instance and pass output stream
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));

document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(new Paragraph("Hello World -- Page 1"));
document.add(new Paragraph("This is my first PDF."));

document.newPage();

document.add(new Paragraph("Hello World -- Page 2"));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Lets create a PDF with some images.

public void createImagePDF(String inImageFilename, String outFilename) {
try {
Document document = new Document(PageSize.A4, 50, 50, 50, 50);

PdfWriter.getInstance(document, new FileOutputStream(outFilename));
document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(Image.getInstance(inImageFilename));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Below sample helps to create password protected PDF. setEncryption function of PDFWriter instance takes owner password, user password and document permissions as arguments. By using user password, document can be opened but only copy and print can be performed. The access is restricted. If the document is opened using owner password, the document will have full access.

public void createPasswordProtectedPDF(String outFilename, String password) {
try {
Document document = new Document(PageSize.A4, 50, 50, 50, 50);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outFilename));

//Set owner password and user password.
writer.setEncryption("Hello".getBytes(),
"World".getBytes(),
PdfWriter.ALLOW_COPY | PdfWriter.ALLOW_PRINTING, PdfWriter.STANDARD_ENCRYPTION_128);

document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(new Paragraph("Hello World -- Page 1"));
document.add(new Paragraph("This is my first PDF."));

document.newPage();

document.add(new Paragraph("Hello World -- Page 2"));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Let's try to read the PDF which we created. Below sample will output the complete document structure of PDF. If you want to know, how PDF is structured, below code can help.

public void readPDF(String filename) {
try {
PdfReader reader = new PdfReader(filename);
System.out.println("Document Metadata");
System.out.println(reader.getInfo().toString());
System.out.println("--");

int numPages = reader.getNumberOfPages();
for (int index =1; index <= numPages; index++) {
byte[] pageBuf = reader.getPageContent(index);
String pageContent = new String(pageBuf);
System.out.println("Page - " + index);
System.out.println(pageContent);
System.out.println("");
}
reader.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

 

Now you have the PDF with some text and you want to extract its text. This would be typical use case where we want to search the PDF content.

public void extractPDF(String filename) {
try {
PdfReader reader = new PdfReader(filename);
System.out.println("Document Metadata");
System.out.println(reader.getInfo().toString());
System.out.println("--");

//Text Extraction
int numPages = reader.getNumberOfPages();
PdfTextExtractor textExtractor = new PdfTextExtractor(reader);
for (int index =1; index <= numPages; index++) {
String pageContent = textExtractor.getTextFromPage(index);
System.out.println("Page - " + index);
System.out.println(pageContent);
}
reader.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

OpenPDF is a nice library and it has lot more features. If you have requirement to generate PDF in the backend then this library can be considered.

Reference:

Github Location - https://github.com/LibrePDF/OpenPDF

Javadocs - https://librepdf.github.io/OpenPDF/docs-1-2-7/

 

 


Sponsored:
To find embedded technology information about MCU, IoT, AI etc Check out embedkari.com.


   

We publish blog post about open source products. If you are interested in sharing knowledge about open source products, please visit write for us

Subscribe to our newsletter.

We will send mail once in a week about latest updates on open source tools and technologies. subscribe our newsletter



Related Articles

LogicalDOC - Open Source DMS

  • dms document-management-system

LogicalDOC is both a document management and a collaboration system. The software is loaded with many functions and allows organizing, indexing, retrieving, controlling and distributing important business documents securely and safely for any organization and individual.

Read More


8 Best Open Source Searchengines built on top of Lucene

  • lucene solr searchengine elasticsearch

Lucene is most powerful and widely used Search engine. Here is the list of 7 search engines which is built on top of Lucene. You could imagine how powerful they are.

Read More


Caching using Ehcache Java Library

  • ehcache cache java map key-value

Ehcache from Terracotta is one of Java's most widely used Cache. It is concurrent and highly scalable. It has small footprint with SL4J as the only dependencies. It supports multiple strategies like Expiration policies, Eviction policies. It supports three storage tiers, heap, off-heap, disk storage. There are very few caching products supports multiple tier storage. If you want to scale, you cannot store all items in heap there should be support for off-heap and disk storage. Ehcache is licensed under Apache 2.0. In this article, we can see about basic usage of Ehcache.

Read More


Quick Start Programming Guide for redis using java client Jedis

  • redis jedis redis-client programming database java

Redis is an open source (BSD licensed), in-memory data structure store, used also as a database cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This article explains about how to communicate with Redis using Java client Jedis.

Read More


Lucene Vs Solr

  • searchengine lucene solr

Lucene is a search library built in Java. Solr is a web application built on top of Lucene. Certainly Solr = Lucene + Added features. Often there would a question, when to choose Solr and when to choose Lucene.

Read More



Thymeleaf - Text display, Iteration and Conditionals

  • thymeleaf template-engine web-programming java

Thymeleaf is a server-side Java template engine for both web and standalone environments. It is a better alternative to JavaServer Pages (JSP). Spring MVC and Thymeleaf compliment each other if chosen for web application development. In this article, we will discuss how to use Thymeleaf.

Read More


Connect to MongoDB and Perform CRUD using Java

  • java mongodb database programming

MongoDB is a popular and widely used open source NoSQL database. MongoDB is a distributed database at its core, so high availability, horizontal scaling, and geographic distribution is quite possible. It is licensed under Server Side Public License. Recently they moved to Server Side Public License, before that MongoDB was released under AGPL. This article will provide basic example to connect and work with MongoDB using Java.

Read More


Introduction to Light 4J Microservices Framework

  • light4j microservice java programming framework

Light 4j is fast, lightweight, secure and cloud native microservices platform written in Java 8. It is based on pure HTTP server without Java EE platform. It is hosted by server UnderTow. Light-4j and related frameworks are released under the Apache 2.0 license.

Read More


ETag Easy With RESTEasy

  • resteasy etag http-header rest-api

RESTEasy is a JBoss project that provides various frameworks to help you build RESTful Web Services and RESTful Java applications. It comprises of frameworks for mock, embeddable server, rest client, proxy servers, logging and so on.In this article, we will walk-through ETag implementation and show the behaviour related to ETag done by rest easy framework. Example is developed using RESTEasy 3.7 and deployed in tomcat as RESTEasy framework is portable.

Read More


Getting Started on Undertow Server

  • java web-server undertow rest

Undertow is a high performing web server which can be used for both blocking and non-blocking tasks. It is extermely flexible as application can assemble the parts in whatever way it would make sense. It also supports Servlet 4.0, JSR-356 compliant web socket implementation. Undertow is licensed under Apache License, Version 2.0.

Read More


RESTEasy - A guide to implement CRUD Rest API

  • resteasy rest-api java framework

RESTEasy is a JBoss project that provides various frameworks to help you build RESTful Web Services and RESTful Java applications. It is a fully certified and portable implementation of the JAX-RS 2.1 specification, a JCP specification that provides a Java API for RESTful Web Services over the HTTP protocol. It is licensed under the Apache 2.0 license.

Read More


Light4j Cookbook - Rest API, CORS and RDBMS

  • light4j sql cors rest-api

Light 4j is a fast, lightweight and cloud-native microservices framework. In this article, we will see what and how hybrid framework works and integrate with RDMS databases like MySQL, also built in option of CORS handler for in-flight request.

Read More


Push Notifications using Angular

  • angular push-notifications notifications

Notifications is a message pushed to user's device passively. Browser supports notifications and push API that allows to send message asynchronously to the user. Messages are sent with the help of service workers, it runs as background tasks to receive and relay the messages to the desktop if the application is not opened. It uses web push protocol to register the server and send message to the application. Once user opt-in for the updates, it is effective way of re-engaging users with customized content.

Read More


Is ZooKeeper mandatory for Cloud

  • zookeeper distributed cloud

Cloud is nothing but more than one system or application distributed across the network, across the globe. It may have couple of application servers, database server, shared data storage, backup server and lot more. The resources in the distributed environment must have information about each other so that they could co-ordinate and share without any issues. ZooKeeper helps to do that.

Read More


Cache using Hazelcast InMemory Data Grid

  • hazelcast cache key-value

Hazelcast is an open source In-Memory Data Grid (IMDG). It provides elastically scalable distributed In-Memory computing, widely recognized as the fastest and most scalable approach to application performance.&nbsp;Hazelcast makes distributed computing simple by offering distributed implementations of many developer-friendly interfaces from Java such as Map, Queue, ExecutorService, Lock and JCache.

Read More


Leaflet and Keyhole Markup Language (KML)

  • leaflet kml maps

Leaflet, a open-source JavaScript library for interactive maps. It is a well-documented API and extended with lot of plugins. It is also designed with simplicity, performance and usability.

Read More


Scene.js - Library to Create Timeline-Based Animation

  • scenejs css timeline javascript animation motion

Scene.js is a JavaScript timeline-based animation library for creating animation websites. As an animated timeline library, it allows you to create a chronological order of movements and positions of objects.

Read More


Angular Service Workers Usage Guide

  • angular service-worker offline-app

Web developers come across scenarios like web application completely breaks when workstation goes offline. Likewise to get into our application, every time we need to open a browser and then access it. Instead if it is in app, it will be easy to access for end-user. Push notifications similar to email client need to be done through web application. All these are addressed by a magic called service worker.

Read More


LucidWorks Vs SearchBlox - Enterprise Search Solution

  • lucene solr searchblox lucidworks enterprise-search

Enterprise search software should be capable to search the data available in the entire organization or personnel desktop. The data could be in File system, Web or in Database. It should search contents of Emails, file formats like doc, xls, ppt, pdf and lot more. There are many commercial products available but LucidWorks and SearchBlox are best and free.

Read More


Getting Started on Angular 7

  • angular ui-ux front-end-framework

Angular is a platform for building responsive web, native desktop and native mobile applications. Angular client applications are built using HTML, CSS and Typescript. Typescript is a typed superset of Javascript that compiles to plain Javascript. Angular core and optional modules are built using Typescript. Code has been licensed as MIT License.

Read More