Read and Write PDF using OpenPDF

  •        0
  

We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.



OpenPDF is based on a fork of iText version 4. iText is a widely used PDF library but they changed their license and moved to AGPL. In this article, we can see how to read and write to PDF, How to extract text from PDF and How to create password protected PDF.

Maven Dependency

<dependency>
   <groupId>com.github.librepdf</groupId>
   <artifactId>openpdf</artifactId>
  <version>1.2.7</version>
</dependency>

<!-- Bouncy castle is required, if you want to encrypt or password protect PDF -->

<dependency>
  <groupId>org.bouncycastle</groupId>
  <artifactId>bcprov-jdk15on</artifactId>
  <version>1.60</version>
</dependency>

 

Below sample will create a two page PDF. Create a Document instance and pass it to PDF writer. Add text using Paragraphs to the document instance.

public void createPDF(String filename) {
try {

Document document = new Document(PageSize.A4, 50, 50, 50, 50);

//create a PDF writer instance and pass output stream
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));

document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(new Paragraph("Hello World -- Page 1"));
document.add(new Paragraph("This is my first PDF."));

document.newPage();

document.add(new Paragraph("Hello World -- Page 2"));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Lets create a PDF with some images.

public void createImagePDF(String inImageFilename, String outFilename) {
try {
Document document = new Document(PageSize.A4, 50, 50, 50, 50);

PdfWriter.getInstance(document, new FileOutputStream(outFilename));
document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(Image.getInstance(inImageFilename));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Below sample helps to create password protected PDF. setEncryption function of PDFWriter instance takes owner password, user password and document permissions as arguments. By using user password, document can be opened but only copy and print can be performed. The access is restricted. If the document is opened using owner password, the document will have full access.

public void createPasswordProtectedPDF(String outFilename, String password) {
try {
Document document = new Document(PageSize.A4, 50, 50, 50, 50);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outFilename));

//Set owner password and user password.
writer.setEncryption("Hello".getBytes(),
"World".getBytes(),
PdfWriter.ALLOW_COPY | PdfWriter.ALLOW_PRINTING, PdfWriter.STANDARD_ENCRYPTION_128);

document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(new Paragraph("Hello World -- Page 1"));
document.add(new Paragraph("This is my first PDF."));

document.newPage();

document.add(new Paragraph("Hello World -- Page 2"));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Let's try to read the PDF which we created. Below sample will output the complete document structure of PDF. If you want to know, how PDF is structured, below code can help.

public void readPDF(String filename) {
try {
PdfReader reader = new PdfReader(filename);
System.out.println("Document Metadata");
System.out.println(reader.getInfo().toString());
System.out.println("--");

int numPages = reader.getNumberOfPages();
for (int index =1; index <= numPages; index++) {
byte[] pageBuf = reader.getPageContent(index);
String pageContent = new String(pageBuf);
System.out.println("Page - " + index);
System.out.println(pageContent);
System.out.println("");
}
reader.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

 

Now you have the PDF with some text and you want to extract its text. This would be typical use case where we want to search the PDF content.

public void extractPDF(String filename) {
try {
PdfReader reader = new PdfReader(filename);
System.out.println("Document Metadata");
System.out.println(reader.getInfo().toString());
System.out.println("--");

//Text Extraction
int numPages = reader.getNumberOfPages();
PdfTextExtractor textExtractor = new PdfTextExtractor(reader);
for (int index =1; index <= numPages; index++) {
String pageContent = textExtractor.getTextFromPage(index);
System.out.println("Page - " + index);
System.out.println(pageContent);
}
reader.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

OpenPDF is a nice library and it has lot more features. If you have requirement to generate PDF in the backend then this library can be considered.

Reference:

Github Location - https://github.com/LibrePDF/OpenPDF

Javadocs - https://librepdf.github.io/OpenPDF/docs-1-2-7/

 

 


Sponsored:
To find embedded technology information about MCU, IoT, AI etc Check out embedkari.com.


   

We publish blog post about open source products. If you are interested in sharing knowledge about open source products, please visit write for us

Subscribe to our newsletter.

We will send mail once in a week about latest updates on open source tools and technologies. subscribe our newsletter



Related Articles

LogicalDOC - Open Source DMS

  • dms document-management-system

LogicalDOC is both a document management and a collaboration system. The software is loaded with many functions and allows organizing, indexing, retrieving, controlling and distributing important business documents securely and safely for any organization and individual.

Read More


8 Best Open Source Searchengines built on top of Lucene

  • lucene solr searchengine elasticsearch

Lucene is most powerful and widely used Search engine. Here is the list of 7 search engines which is built on top of Lucene. You could imagine how powerful they are.

Read More


Caching using Ehcache Java Library

  • ehcache cache java map key-value

Ehcache from Terracotta is one of Java's most widely used Cache. It is concurrent and highly scalable. It has small footprint with SL4J as the only dependencies. It supports multiple strategies like Expiration policies, Eviction policies. It supports three storage tiers, heap, off-heap, disk storage. There are very few caching products supports multiple tier storage. If you want to scale, you cannot store all items in heap there should be support for off-heap and disk storage. Ehcache is licensed under Apache 2.0. In this article, we can see about basic usage of Ehcache.

Read More


Quick Start Programming Guide for redis using java client Jedis

  • redis jedis redis-client programming database java

Redis is an open source (BSD licensed), in-memory data structure store, used also as a database cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This article explains about how to communicate with Redis using Java client Jedis.

Read More


Lucene Vs Solr

  • searchengine lucene solr

Lucene is a search library built in Java. Solr is a web application built on top of Lucene. Certainly Solr = Lucene + Added features. Often there would a question, when to choose Solr and when to choose Lucene.

Read More



Thymeleaf - Text display, Iteration and Conditionals

  • thymeleaf template-engine web-programming java

Thymeleaf is a server-side Java template engine for both web and standalone environments. It is a better alternative to JavaServer Pages (JSP). Spring MVC and Thymeleaf compliment each other if chosen for web application development. In this article, we will discuss how to use Thymeleaf.

Read More


Connect to MongoDB and Perform CRUD using Java

  • java mongodb database programming

MongoDB is a popular and widely used open source NoSQL database. MongoDB is a distributed database at its core, so high availability, horizontal scaling, and geographic distribution is quite possible. It is licensed under Server Side Public License. Recently they moved to Server Side Public License, before that MongoDB was released under AGPL. This article will provide basic example to connect and work with MongoDB using Java.

Read More


Introduction to Light 4J Microservices Framework

  • light4j microservice java programming framework

Light 4j is fast, lightweight, secure and cloud native microservices platform written in Java 8. It is based on pure HTTP server without Java EE platform. It is hosted by server UnderTow. Light-4j and related frameworks are released under the Apache 2.0 license.

Read More


Getting Started on Undertow Server

  • java web-server undertow rest

Undertow is a high performing web server which can be used for both blocking and non-blocking tasks. It is extermely flexible as application can assemble the parts in whatever way it would make sense. It also supports Servlet 4.0, JSR-356 compliant web socket implementation. Undertow is licensed under Apache License, Version 2.0.

Read More


Is ZooKeeper mandatory for Cloud

  • zookeeper distributed cloud

Cloud is nothing but more than one system or application distributed across the network, across the globe. It may have couple of application servers, database server, shared data storage, backup server and lot more. The resources in the distributed environment must have information about each other so that they could co-ordinate and share without any issues. ZooKeeper helps to do that.

Read More


LucidWorks Vs SearchBlox - Enterprise Search Solution

  • lucene solr searchblox lucidworks enterprise-search

Enterprise search software should be capable to search the data available in the entire organization or personnel desktop. The data could be in File system, Web or in Database. It should search contents of Emails, file formats like doc, xls, ppt, pdf and lot more. There are many commercial products available but LucidWorks and SearchBlox are best and free.

Read More


How to solve CommunicationsException in Java while using Hibernate and MySQL

  • java hibernate mysql communicationsexception timeout

You might have faced CommunicationsException, basically timeout issue in Java while connecting to MySQL using Hibernate. The session would be timed out after certain period of time. You might be thinking that the your site or application is running without any issue but it would have stopped or crashed due to exception.

Read More


AbanteCart - Easy to use open source e-commerce platform, helps selling online

  • e-commerce ecommerce cart

AbanteCart is a free, open source shopping cart that was built by developers with a passion for free and accessible software. Founded in 2010 (launched in 2011), the platform is coded in PHP and supports MySQL. AbanteCart’s easy to use admin and basic layout management tool make this open source solution both easy to use and customizable, depending on the skills of the user. AbanteCart is very user-friendly, it is entirely possible for a user with little to no coding experience to set up and use this cart. If the user would be limited to the themes and features available in base AbanteCart, there is a marketplace where third-party extensions or plugins come to the rescue.

Read More


How to install and setup Redis

  • redis install setup redis-cluster

Redis is an open source (BSD licensed), in-memory data structure store, used also as a database cache and message broker. It is written in ANSI C and works in all the operating systems. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams. This article explains about how to install Redis.

Read More


Best open source Text Editors

  • text-editor editor tools dev-tools

Text editors are mainly used by programmers and developers for manipulating plain text source code, editing configuration files or preparing documentation and even viewing error logs. Text editors is a piece of software which enables to create, modify and delete files that a programmer is using while creating website or mobile app.In this article, we will discuss about top 7 all-round performing text editors which is highly supportive for programmers.

Read More


10 sites to get the large data set or data corpus for free

  • search test-data large-data-set data-corpus dataset

You may require GBs of data to do performance or load testing. How your app behaves when there is loads of data. You need to know the capacity of your application. This is the frequently asked question from the sales team "The customer is having 100GB of data and he wants to know whether our product will handle this? If so how much RAM / Disk storage required?". This article has pointers to the large data corpus.

Read More


Marketing stratigies required to sell open source product

  • opensource selling promote

Many new products are coming in the open source world. Few are forking existing project, adding new features to it and selling it as open source product. Few strategies required to follow to sell the product better.

Read More


Introduction to Apache Cassandra

  • cassandra database nosql

Apache Cassandra was designed by Facebook and was open-sourced in July 2008. It is regarded as perfect choice when the users demand scalability and high availability without any impact towards performance. Apache Cassandra is highly scalable, high-performance distributed database designed to handle large voluminous amounts of data across many commodity servers with no failure.

Read More


Exonum Blockchain Framework by the Bitfury Group

  • blockchain bitcoin hyperledger blockchain-framework

Exonum is an extensible open source blockchain framework for building private blockchains which offers outstanding performance, data security, as well as fault tolerance. The framework does not include any business logic, instead, you can develop and add the services that meet your specific needs. Exonum can be used to build various solutions from a document registry to a DevOps facilitation system.

Read More


Univention Corporate Server - An open source identity management system

  • ucs identity-management-system

Univention Corporate Server is an open source identity management system, an IT infrastructure and device management solution and an extensible platform with a store-like App Center that includes tested third party applications and further UCS components: This is what Univention combines in their main product Univention Corporate Server, a Debian GNU/Linux based enterprise distribution. This article provides you the overview of Univention Corporate Server, its feature and installation.

Read More