Read and Write PDF using OpenPDF

  •        0
  

We aggregate and tag open source projects. We have collections of more than one million projects. Check out the projects section.



OpenPDF is based on a fork of iText version 4. iText is a widely used PDF library but they changed their license and moved to AGPL. In this article, we can see how to read and write to PDF, How to extract text from PDF and How to create password protected PDF.

Maven Dependency

<dependency>
   <groupId>com.github.librepdf</groupId>
   <artifactId>openpdf</artifactId>
  <version>1.2.7</version>
</dependency>

<!-- Bouncy castle is required, if you want to encrypt or password protect PDF -->

<dependency>
  <groupId>org.bouncycastle</groupId>
  <artifactId>bcprov-jdk15on</artifactId>
  <version>1.60</version>
</dependency>

 

Below sample will create a two page PDF. Create a Document instance and pass it to PDF writer. Add text using Paragraphs to the document instance.

public void createPDF(String filename) {
try {

Document document = new Document(PageSize.A4, 50, 50, 50, 50);

//create a PDF writer instance and pass output stream
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(filename));

document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(new Paragraph("Hello World -- Page 1"));
document.add(new Paragraph("This is my first PDF."));

document.newPage();

document.add(new Paragraph("Hello World -- Page 2"));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Lets create a PDF with some images.

public void createImagePDF(String inImageFilename, String outFilename) {
try {
Document document = new Document(PageSize.A4, 50, 50, 50, 50);

PdfWriter.getInstance(document, new FileOutputStream(outFilename));
document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(Image.getInstance(inImageFilename));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Below sample helps to create password protected PDF. setEncryption function of PDFWriter instance takes owner password, user password and document permissions as arguments. By using user password, document can be opened but only copy and print can be performed. The access is restricted. If the document is opened using owner password, the document will have full access.

public void createPasswordProtectedPDF(String outFilename, String password) {
try {
Document document = new Document(PageSize.A4, 50, 50, 50, 50);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outFilename));

//Set owner password and user password.
writer.setEncryption("Hello".getBytes(),
"World".getBytes(),
PdfWriter.ALLOW_COPY | PdfWriter.ALLOW_PRINTING, PdfWriter.STANDARD_ENCRYPTION_128);

document.open();
document.addAuthor("Author-Name");
document.addCreationDate();

document.add(new Paragraph("Hello World -- Page 1"));
document.add(new Paragraph("This is my first PDF."));

document.newPage();

document.add(new Paragraph("Hello World -- Page 2"));
document.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

Let's try to read the PDF which we created. Below sample will output the complete document structure of PDF. If you want to know, how PDF is structured, below code can help.

public void readPDF(String filename) {
try {
PdfReader reader = new PdfReader(filename);
System.out.println("Document Metadata");
System.out.println(reader.getInfo().toString());
System.out.println("--");

int numPages = reader.getNumberOfPages();
for (int index =1; index <= numPages; index++) {
byte[] pageBuf = reader.getPageContent(index);
String pageContent = new String(pageBuf);
System.out.println("Page - " + index);
System.out.println(pageContent);
System.out.println("");
}
reader.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

 

Now you have the PDF with some text and you want to extract its text. This would be typical use case where we want to search the PDF content.

public void extractPDF(String filename) {
try {
PdfReader reader = new PdfReader(filename);
System.out.println("Document Metadata");
System.out.println(reader.getInfo().toString());
System.out.println("--");

//Text Extraction
int numPages = reader.getNumberOfPages();
PdfTextExtractor textExtractor = new PdfTextExtractor(reader);
for (int index =1; index <= numPages; index++) {
String pageContent = textExtractor.getTextFromPage(index);
System.out.println("Page - " + index);
System.out.println(pageContent);
}
reader.close();
}
catch(Exception exp) {
System.out.println(exp.getMessage());
}
}

OpenPDF is a nice library and it has lot more features. If you have requirement to generate PDF in the backend then this library can be considered.

Reference:

Github Location - https://github.com/LibrePDF/OpenPDF

Javadocs - https://librepdf.github.io/OpenPDF/docs-1-2-7/

 

 


Sponsored:
To find embedded technology information about MCU, IoT, AI etc Check out embedkari.com.


   

We publish blog post about open source products. If you are interested in sharing knowledge about open source products, please visit write for us




Related Articles

LogicalDOC - Open Source DMS

  • dms document-management-system

LogicalDOC is both a document management and a collaboration system. The software is loaded with many functions and allows organizing, indexing, retrieving, controlling and distributing important business documents securely and safely for any organization and individual.

Read More


8 Best Open Source Searchengines built on top of Lucene

  • lucene solr searchengine elasticsearch

Lucene is most powerful and widely used Search engine. Here is the list of 7 search engines which is built on top of Lucene. You could imagine how powerful they are.

Read More


Lucene Vs Solr

  • searchengine lucene solr

Lucene is a search library built in Java. Solr is a web application built on top of Lucene. Certainly Solr = Lucene + Added features. Often there would a question, when to choose Solr and when to choose Lucene.

Read More


Connect to MongoDB and Perform CRUD using Java

  • java mongodb database programming

MongoDB is a popular and widely used open source NoSQL database. MongoDB is a distributed database at its core, so high availability, horizontal scaling, and geographic distribution is quite possible. It is licensed under Server Side Public License. Recently they moved to Server Side Public License, before that MongoDB was released under AGPL. This article will provide basic example to connect and work with MongoDB using Java.

Read More


Is ZooKeeper mandatory for Cloud

  • zookeeper distributed cloud

Cloud is nothing but more than one system or application distributed across the network, across the globe. It may have couple of application servers, database server, shared data storage, backup server and lot more. The resources in the distributed environment must have information about each other so that they could co-ordinate and share without any issues. ZooKeeper helps to do that.

Read More



LucidWorks Vs SearchBlox - Enterprise Search Solution

  • lucene solr searchblox lucidworks enterprise-search

Enterprise search software should be capable to search the data available in the entire organization or personnel desktop. The data could be in File system, Web or in Database. It should search contents of Emails, file formats like doc, xls, ppt, pdf and lot more. There are many commercial products available but LucidWorks and SearchBlox are best and free.

Read More


How to solve CommunicationsException in Java while using Hibernate and MySQL

  • java hibernate mysql communicationsexception timeout

You might have faced CommunicationsException, basically timeout issue in Java while connecting to MySQL using Hibernate. The session would be timed out after certain period of time. You might be thinking that the your site or application is running without any issue but it would have stopped or crashed due to exception.

Read More


AbanteCart - Easy to use open source e-commerce platform, helps selling online

  • e-commerce ecommerce cart

AbanteCart is a free, open source shopping cart that was built by developers with a passion for free and accessible software. Founded in 2010 (launched in 2011), the platform is coded in PHP and supports MySQL. AbanteCart’s easy to use admin and basic layout management tool make this open source solution both easy to use and customizable, depending on the skills of the user. AbanteCart is very user-friendly, it is entirely possible for a user with little to no coding experience to set up and use this cart. If the user would be limited to the themes and features available in base AbanteCart, there is a marketplace where third-party extensions or plugins come to the rescue.

Read More


10 sites to get the large data set or data corpus for free

  • search test-data large-data-set data-corpus dataset

You may require GBs of data to do performance or load testing. How your app behaves when there is loads of data. You need to know the capacity of your application. This is the frequently asked question from the sales team "The customer is having 100GB of data and he wants to know whether our product will handle this? If so how much RAM / Disk storage required?". This article has pointers to the large data corpus.

Read More


Marketing stratigies required to sell open source product

  • opensource selling promote

Many new products are coming in the open source world. Few are forking existing project, adding new features to it and selling it as open source product. Few strategies required to follow to sell the product better.

Read More


Exonum Blockchain Framework by the Bitfury Group

  • blockchain bitcoin hyperledger blockchain-framework

Exonum is an extensible open source blockchain framework for building private blockchains which offers outstanding performance, data security, as well as fault tolerance. The framework does not include any business logic, instead, you can develop and add the services that meet your specific needs. Exonum can be used to build various solutions from a document registry to a DevOps facilitation system.

Read More


Univention Corporate Server - An open source identity management system

  • ucs identity-management-system

Univention Corporate Server is an open source identity management system, an IT infrastructure and device management solution and an extensible platform with a store-like App Center that includes tested third party applications and further UCS components: This is what Univention combines in their main product Univention Corporate Server, a Debian GNU/Linux based enterprise distribution. This article provides you the overview of Univention Corporate Server, its feature and installation.

Read More


10 Free services for your Website / Blog. Just plug it.

  • free website blog free-service free-resources

Each website / blog delivers useful content or service to its users. But website themselves requires some service to monitor and increase its presence. Here are few free services which could be used by Website / Blog. This will be very much helpful for small business owners.

Read More


An Introduction to the UnQLite Embedded NoSQL Database Engine

  • database nosql embedded key-value-store

UnQLite is an embedded NoSQL database engine. It's a standard Key/Value store similar to the more popular Berkeley DB and a document-store database similar to MongoDB with a built-in scripting language called Jx9 that looks like Javascript. Unlike most other NoSQL databases, UnQLite does not have a separate server process. UnQLite reads and writes directly to ordinary disk files. A complete database with multiple collections is contained in a single disk file. The database file format is cross-platform, you can freely copy a database between 32-bit and 64-bit systems or between big-endian and little-endian architectures.

Read More


Microweber CMS - An open source CMS with Ecommerce support

  • cms e-commerce microweber

To the user's satisfaction, there is a whole wide world of different CMS, all suitable for different needs. You can go for the giants like Wordpress or Joomla or pick one of the rising forces - Shopify, Squarespace or others. Microweber CMS fills a hole in the current technological ecosystem, aimed at delivering a light software that is perfect for all end-users lacking the technical knowledge needed for complicated website building.

Read More


How to create SEO friendly url

  • seo url searchengine

SEO friendly URL is recommended for any website which wants to be indexed and wants its presence in search results. Searchengine mostly index the static URL. It will avoid the URL which has lot of query strings. Almost all websites generate content dynamically then how could the URL be static. That is the job of the programmer.

Read More


ONLYOFFICE Document Server, an online office app for Nextcloud and ownCloud

  • office office-suite word spreadsheet

ONLYOFFICE Document Server is a free collaborative online office suite including viewers and editors for texts, spreadsheets and presentations, fully compatible with Office Open XML formats (.docx, .xlsx, .pptx). This article provides you the overview of ONLYOFFICE Document Server, its features, installation and integration with Nextcloud and ownCloud.

Read More


Whats new in Lucene / Solr 4.0

  • lucene solr new-release

The release 4.0 is one of the important milestone for Lucene and Solr. It has lot of new features and performance important. Few important ones are highliggted in this article.

Read More


LetsEncrypt certificate using ZeroSSL tools

  • ssl-certificate certificate security

Let’s Encrypt is a free, automated, and open Certificate Authority. It uses ACME protocol to validate your domain. If you have complete control over your domain, you can get a certificate for free. In order to provide secure access to your public network like HTTPS, LDAPS etc you need a certificate from a Certificate Authority. The cost of the certificate range from 10$ to 100$. If you want a wildcard certificate then it may cost more. The certificate is valid for one year and you need to pay and renew every year. Let's Encrypt comes for the rescue. You can create and renew certificate for few.

Read More


PrestaShop - A feature rich Open Source eCommerce solution

PrestaShop is an Open Source eCommerce Solution. It comes complete with over 310 features that have been carefully developed to assist business owners in increasing sales with virtually little effort. It is being used in more than 150,000 online stores.

Read More