pdf-to-text - Extract text from a pdf

  •        49

This package provides a class to extract text from a pdf. Spatie is a webdesign agency based in Antwerp, Belgium. You'll find an overview of all our open source projects on our website.

https://murze.be/2016/01/a-package-to-extract-text-from-a-pdf/
https://github.com/spatie/pdf-to-text

Tags
Implementation
License
Platform

   




Related Projects

PDFBox - Java PDF library

  •    Java

Apache PDFBox is an open source Java PDF library for working with PDF documents. This library allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. It provides support for adding bookmarks, fonts, text extraction, Encryption, PDF printing and lot more.

Images-to-PDF - An app to convert images to PDF file!

  •    Java

You can also join the Images To PDF Team on Slack https://imagestopdf.slack.com/ and chat with developers. Use this link to join our slack team.

docconv - Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text

  •    Go

A Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain text. Note for returning users: the Go code path for this pkg been moved to code.sajari.com/docconv. Follow the installation instructions to checkout a version of the code in the correct place.

TCPDF - PHP class for generating PDF

  •    PHP

TCPDF is a PHP class for generating PDF documents without requiring external extensions. TCPDF Supports UTF-8, Unicode, RTL languages, XHTML, Javascript, digital signatures, barcodes and much more.

Pandoc - General Markup Converter

  •    Haskell

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It an convert documents in markdown, reStructuredText, textile, HTML, DocBook, or LaTeX to HTML formats, Word processor formats, PDF and other markup formats.


asciidoctor-pdf - :page_with_curl: Asciidoctor PDF: A native PDF converter for AsciiDoc based on Asciidoctor and Prawn, written entirely in Ruby

  •    Ruby

Prawn is a nimble PDF writer for Ruby. More important, it’s a hackable platform that offers both high level APIs for the most common needs and low level APIs for bending the document model to accommodate special circumstances. With Prawn, you can write text, draw lines and shapes and place images anywhere on the page and add as much color as you like. In addition, it brings a fluent API and aggressive code re-use to the printable document space.

iText - Java PDF library

  •    Java

iText is one of the popular and widely used PDF library. It is used to generate PDF documents dynamically. Mostly web developers will love it to generate PDF documents and reports based on data from an XML file or a database and serves it to the browser. It has support of adding bookmarks, watermarks, Encryption, Form filling and lot more.

PDF Library - PDF manipulation in .NET

  •    VBNET

A library for PDF manipulation implementing Adobe PDF standard version 1.7. This library allows to read PDF files and apply changes to them, it is written in .NET 2.0 using Visual Studio 2005. Writing and Parsing PDF is supported.

PDFClown - PDF library

  •    Java

PDFClown is a PDF library helps to generate, read and edit PDF. It helps to split and merge the PDF documents. It has support to add Images, Fonts, Barcodes, Bookmarks, Annotations, Form fields like checkbox, button, list box etc, Compression, text extraction.

SWFTools - Utilities to work with Adobe Flash files

  •    C

SWFTools is a collection of utilities for working with Adobe Flash files (SWF files). The tool collection includes programs for reading SWF files, combining them, and creating them from other content (like images, sound files, videos or sourcecode).

HexaPDF - A Versatile PDF Creation and Manipulation Library for Ruby

  •    Ruby

HexaPDF is a pure Ruby library with an accompanying application for working with PDF files. It supports Creating new PDF files, Manipulating existing PDF files, Merging multiple PDF files into one, Extracting meta information, text, images and files from PDF files, Securing PDF files by encrypting them and optimizing PDF files for smaller file size or other criteria.

PDFJet - PDF library for Java and .NET

  •    Java

PDFjet is a high performance PDF library for Java and .NET. It has support of drawing points, lines, box, polygons etc. It supports unicode text, embedding images, embedding hyperlinks and lot more. Its simple to use table class helps to generate flexible reports.

PDFSharp - Create and process PDF in .NET

  •    CSharp

PDFsharp is the Open Source .NET library that easily creates and processes PDF documents on the fly from any .NET language. The same drawing routines can be used to create PDF documents, draw on the screen, or send output to any printer. Neither Adobe's PDF Library nor Acrobat are required.

html-pdf-chrome - HTML to PDF converter via Chrome/Chromium

  •    TypeScript

HTML to PDF converter via Chrome/Chromium. Note: It is strongly recommended that you keep Chrome running side-by-side with Node.js. There is significant overhead starting up Chrome for each PDF generation which can be easily avoided.

CCP

  •    Ruby

Convert copy. Pipes existing converter tools together to convert files from a format to another. E.g. add the converters 'pdf -gt; text' and 'text -gt; wav' and you can convert pdf to wav. Automatically installs missing converters in Debian.

PDFEdit

  •    C++

PDFedit is a free open source pdf editor and a library for manipulating PDF documents. It includes PDF manipulating library based on xpdf, GUI, set of command line tools and a pdf editor. You can use it to read, change and extract information from a PDF file. It is based on xpdf library.

unipdf - Golang PDF library for creating and processing PDF files (pure go)

  •    Go

UniDoc's UniPDF (formerly unidoc) is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is written and supported by FoxyUtils.com, where the library is used to power many of its services. Multiple examples are provided in our example repository https://github.com/unidoc/unidoc-examples as well as documented examples on our website.

MuPDF - A Lightweight PDF, XPS and E-book viewer

  •    C

MuPDF is a lightweight PDF, XPS, and E-book viewer. It renders text with metrics and spacing accurate to within fractions of a pixel for the highest fidelity in reproducing the look of a printed page on screen. It supports PDF 1.7 with transparency, encryption, hyperlinks, annotations, searching and more. The viewer is small, fast, yet complete. It supports many document formats, such as PDF, XPS, OpenXPS, CBZ, EPUB, and FictionBook 2. You can annotate PDF documents and fill out forms with the mobile viewers (this feature is coming soon to the desktop viewer as well).

PST to PDF Converter Eligible to Batch Export Outlook Emails as PDF

  •    

PST to PDF converter let concur to repair and export Outlook emails to PDF. Start to batch convert PST file without Outlook via optimum PST to PDF converter.

docx4j - JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files

  •    Java

docx4j is a library which helps you to work with the Office OpenXML file format as used in docx documents, pptx presentations, and xlsx spreadsheets.