php-pdf-2-text - Simple PHP PDF to Text class

  •        2039

This version support composer and PSR-4 autoloading. Origin code is maintained by Darren Inwood. GNU General Public License version 2 or later.



Related Projects

TCPDF - PHP class for generating PDF

  •    PHP

TCPDF is a PHP class for generating PDF documents without requiring external extensions. TCPDF Supports UTF-8, Unicode, RTL languages, XHTML, Javascript, digital signatures, barcodes and much more.

PDFBox - Java PDF library

  •    Java

Apache PDFBox is an open source Java PDF library for working with PDF documents. This library allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. It provides support for adding bookmarks, fonts, text extraction, Encryption, PDF printing and lot more.

iText - Java PDF library

  •    Java

iText is one of the popular and widely used PDF library. It is used to generate PDF documents dynamically. Mostly web developers will love it to generate PDF documents and reports based on data from an XML file or a database and serves it to the browser. It has support of adding bookmarks, watermarks, Encryption, Form filling and lot more.

PDF Library - PDF manipulation in .NET

  •    VBNET

A library for PDF manipulation implementing Adobe PDF standard version 1.7. This library allows to read PDF files and apply changes to them, it is written in .NET 2.0 using Visual Studio 2005. Writing and Parsing PDF is supported.

MuPDF - A Lightweight PDF, XPS and E-book viewer

  •    C

MuPDF is a lightweight PDF, XPS, and E-book viewer. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on screen. It supports PDF 1.7 with transparency, encryption, hyperlinks, annotations, searching and more. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB, and FictionBook 2. You can annotate PDF documents and fill out forms with the mobile viewers (this feature is coming soon to the desktop viewer as well).

PDFClown - PDF library

  •    Java

PDFClown is a PDF library helps to generate, read and edit PDF. It helps to split and merge the PDF documents. It has support to add Images, Fonts, Barcodes, Bookmarks, Annotations, Form fields like checkbox, button, list box etc, Compression, text extraction.

HexaPDF - A Versatile PDF Creation and Manipulation Library for Ruby

  •    Ruby

HexaPDF is a pure Ruby library with an accompanying application for working with PDF files. It supports Creating new PDF files, Manipulating existing PDF files, Merging multiple PDF files into one, Extracting meta information, text, images and files from PDF files, Securing PDF files by encrypting them and optimizing PDF files for smaller file size or other criteria.

PDFJet - PDF library for Java and .NET

  •    Java

PDFjet is a high performance PDF library for Java and .NET. It has support of drawing points, lines, box, polygons etc. It supports unicode text, embedding images, embedding hyperlinks and lot more. Its simple to use table class helps to generate flexible reports.

canvas - Cairo in Go: vector to raster, SVG, PDF, EPS, WASM, OpenGL, Gio, etc.

  •    Go

Canvas is a common vector drawing target that can output SVG, PDF, EPS, raster images (PNG, JPG, GIF, ...), HTML Canvas through WASM, OpenGL, and Gio. It has a wide range of path manipulation functionality such as flattening, stroking and dashing implemented. Additionally, it has a text formatter and embeds and subsets fonts (TTF, OTF, WOFF, WOFF2, or EOT) or converts them to outlines. It can be considered a Cairo or node-canvas alternative in Go. See the example below in Figure 1 for an overview of the functionality. Figure 1: top-left you can see text being fitted into a box, justified using Donald Knuth's linea breaking algorithm to stretch the spaces between words to fill the whole width. You can observe a variety of styles and text decorations applied, as well as support for LTR/RTL mixing and complex scripts. In the bottom-right the word "stroke" is being stroked and drawn as a path. Top-right we see a LaTeX formula that has been converted to a path. Left of that we see an ellipse showcasing precise dashing, notably the length of e.g. the short dash is equal wherever it is on the curve. Note that the dashes themselves are elliptical arcs as well (thus exactly precise even if magnified greatly). To the right we see a closed polygon of four points being smoothed by cubic Béziers that are smooth along the whole path, and the blue line on the left shows a smoothed open path. On the bottom you can see a rotated rasterized image. The result is equivalent for all renderers (PNG, PDF, SVG, etc.).

PDFSharp - Create and process PDF in .NET

  •    CSharp

PDFsharp is the Open Source .NET library that easily creates and processes PDF documents on the fly from any .NET language. The same drawing routines can be used to create PDF documents, draw on the screen, or send output to any printer. Neither Adobe's PDF Library nor Acrobat are required.


  •    C++

PDFedit is a free open source pdf editor and a library for manipulating PDF documents. It includes PDF manipulating library based on xpdf, GUI, set of command line tools and a pdf editor. You can use it to read, change and extract information from a PDF file. It is based on xpdf library.

unipdf - Golang PDF library for creating and processing PDF files (pure go)

  •    Go

UniDoc's UniPDF (formerly unidoc) is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is written and supported by, where the library is used to power many of its services. Multiple examples are provided in our example repository as well as documented examples on our website.

Images-to-PDF - An app to convert images to PDF file!

  •    Java

You can also join the Images To PDF Team on Slack and chat with developers. Use this link to join our slack team.

docconv - Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text

  •    Go

A Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain text. Note for returning users: the Go code path for this pkg been moved to Follow the installation instructions to checkout a version of the code in the correct place.

borb - Library for reading, creating and manipulating PDF files in Python

  •    Python

borb is a pure python library to read, write and manipulate PDF documents. It represents a PDF document as a JSON-like datastructure of nested lists, dictionaries and primitives (numbers, string, booleans, etc).

Solr - Blazing-fast, open source enterprise search platform

  •    Java

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

Pandoc - General Markup Converter

  •    Haskell

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It an convert documents in markdown, reStructuredText, textile, HTML, DocBook, or LaTeX to HTML formats, Word processor formats, PDF and other markup formats.

pdfextract - A tool and library that can extract various areas of text from a PDF, especially a scholarly article PDF

  •    Ruby

A tool and library that can extract various areas of text from a PDF, especially a scholarly article PDF. It performs structural analysis to determine column bounds, headers, footers, sections, titles and so on. It can analyse and categorise sections into reference and non-reference sections and can split reference sections into individual references. The latest version is 0.1.1. Earlier versions are far less reliable.

HTML 2 PDF - a PHP script

  •    PHP

Get a HTML text and generate a PDF file to make it printer-friendly. This PHP script is based upon FPDF PHP script ( More info can be found on the home page (

jPod - PDF manipulating and rendering framework

  •    Java

jPod is a PDF manipulating and rendering framework. It provides functionality to read, verify the document against the PDF specification. It also provides content stream and rendering framework. It could able to create new document and do incremental updates.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.