•        0

hOcr2Pdf.NET is a .NET library to convert .hocr html into searchable pdfs using HtmlAgilityPack and iTextSharp. Currently supports Tesseract hocr files and Cuneiform hocr files. It is written in C#.




comments powered by Disqus

Related Projects

Html Agility Pack

This is an HTML parser that builds a read/write DOM from “real world” HTML files. It supports XPATH or XSLT and is tolerant with "real world" malformed HTML.


An open source C# PDF library

PDFClown - PDF library

PDFClown is a PDF library helps to generate, read and edit PDF. It helps to split and merge the PDF documents. It has support to add Images, Fonts, Barcodes, Bookmarks, Annotations, Form fields like checkbox, button, list box etc, Compression, text extraction.

jPDF Tweak

A Swiss Army Knife GUI application for PDF documents: combine, split, rotate, reorder (n-up, booklet), watermark, edit bookmarks/fileinfo/pagetransition, compress, encrypt, decrypt, sign, repair, edit attachments and more.

SWFTools - Utilities to work with Adobe Flash files

SWFTools is a collection of utilities for working with Adobe Flash files (SWF files). The tool collection includes programs for reading SWF files, combining them, and creating them from other content (like images, sound files, videos or sourcecode).


Plone lets non-technical people create and maintain information using only a web browser. Perfect for web sites or intranets, Plone offers superior security without sacrificing extensibility or ease of use.

Kiwix - Offline Reader For Wikipedia

Kiwix enables you to have the whole Wikipedia at hand wherever you go. Kiwix gives you access to the whole human knowledge. You don't need Internet, everything is stored on your computer, USB flash drive or DVD. It is basically an offline reader for web content. It supports the ZIM format, a highly compressed open format with additional meta-data.

TCPDF - PHP class for generating PDF

TCPDF is a PHP class for generating PDF documents without requiring external extensions. TCPDF Supports UTF-8, Unicode, RTL languages, XHTML, Javascript, digital signatures, barcodes and much more.

Pdfrecompressor - PDF recompressor using JBIG2 called pdfJbIm

PDF recompressor that uses jbig2enc encoder with improved perceptually lossless coding for symbol coding. If you want to use this project using maven there is enough to set it as maven dependency in pom.xml: <dependency> <groupId>cz.muni</groupId> <artifactId>pdfJbIm</artifactId> <version>1.1</version> </dependency>Links to relevant publications Document Engineering for a Digital Library: PDF recompression using JBIG2 and other optimization of PDF documents (2010); Sojka, P. - Hatlapatka, R. PDF


Experimental browser that natively supports HTML, PDF, man pages, TeX DVI, scanned paper. Annotate with hyperlinks, highlights, notes, executable copy editor markup. [PDF tools for compress, impose, decrypt/encrypt, split/merge have been moved.]