Displaying 1 to 20 from 23 results

paperwork - Personal document manager (Linux/Windows)

Paperwork is a personal document manager. It manages scanned documents and PDFs.It's designed to be easy and fast to use. The idea behind Paperwork is "scan & forget": You can just scan a new document and forget about it until the day you need it again.

pyocr - A Python wrapper for Tesseract and Cuneiform

PyOCR is an optical character recognition (OCR) tool wrapper for python. That is, it helps using various OCR tools from a Python program.It has been tested only on GNU/Linux systems. It should also work on similar systems (*BSD, etc). It may or may not work on Windows, MacOSX, etc.

ocrad.js - OCR in Javascript via Emscripten

As with any minor stepping stone on the road to hell relentless trajectory of Atwood's Law, I probably don't need to justify the existence of yet another "x, but now in Javascript!", but I might as well try. After all, we all would like to think that there's some ulterior motive to fulfilling that prophecy. On tablet or other touchscreen devices- of which there are quite a number of nowadays (as the New Year's Eve post, I am obliged to include conjecture about the technological zeitgeist), a library such as Ocrad.js might be used to add handwriting input in a device and operating system agnostic manner. Oftentimes, capturing the strokes and sending them over to a server to process might entail unacceptably high latency. Maybe you're working on an offline-capable note-taking app, or a browser extension which indexes all the doge memes that you stumble upon while prawling the dark corners of the internet.


GOCR is an OCR (Optical Character Recognition) program, developed under the GNU Public License. It converts scanned images of text back to text files. Joerg Schulenburg started the program, and now leads a team of developers.

OCRmyPDF - OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. For details: please consult the documentation.


Java OCR is an Optical Character Recognition algorithm based on a mean squared recognizer. This tool also includes utilities to trace and extract characters.


A .NET 2.0 Open Source OCR assembly using Tesseract engine.

gosseract - Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

Golang OCR package, by using Tesseract C++ library. Check Dockerfile for more detail of installation, or you can just try by docker run -it --rm otiai10/gosseract.

JS-OCR-demo - JavaScript optical character recognition demo

JavaScript optical character recognition demo. Check it out here.

Neural network - digit recognition

Digit recognition contain implementation of simple and effective implementation of neural network. Neural network is used to recognize handwritten digits - OCR system. Core functionality it is developed in C++ native programming language, STL, boost, GUI in C++ .NET.


WS-FileConvertor is a .NET application that converts image files into text readable format. Application also uploads the converted files (Text format)into SharePoint. All image formats are supported including GIF, JPG, BMP, TIFF, etc.

SharePoint OCR image files indexing

IFilter plugin for the Microsoft Indexing Service (and Sharepoint in particular) to index and search image files (including TIFF, PDF, JPEG, BMP...) using OCR technology.

Images 2 OpenXML

Images2OpenXML its an application that uses Office 2007 OCR API to convert images generated by scanned documents to OpenXML documents. There's no need of third party applications anymore to convert documents, you can use this tool for free. It was developed with C#.


OCR in .NET. Puma.NET is a wrapper library for Cognitive Technologies CuneiFrom recognition engine that makes it easy to incorporate OCR functionality in any .NET Framework 2.0 (or higher) application. The API is provided through a number of simple classes.

ambar - :mag: Ambar: Document Search System

Ambar is an open-source document search and management system with automated crawling, OCR, tagging and instant full-text search.There are two editions available: Community and Enterprise. Enterprise Edition is a full featured document search and management system that can handle terabytes of data.

receipt-parser - A fuzzy (supermarket) receipt parser written in Python using tesseract

This is a fuzzy receipt parser written in Python. You give it any dirty old receipt lying around and it will try its best to find the correct data for you. It started as a hackathon project. Read more about it on the trivago techblog. Also read the comments on HackerNews Oh hey! And there's also a talk online now if you're the visual kind of person.

Android-OCRSample - Android OCR example application which uses Google Text Recognition API

This is an example Android application for OCR. The current version uses Text Recognition API Overview while the old version used Tesseract.

ristretto - Ristretto is an Optical Character Recognition library and API for fetching remote images from the web

Ristretto is an Optical Character Recognition library and API for fetching remote images from the web. It has nothing to do with Coffee. Ristretto uses the open source OCR library, Tesseract.

tesseract - A PHP wrapper for the Tesseract OCR engine

A small PHP >=5.3 library that makes working with the open source Tesseract OCR engine easier. You need a working Tesseract installation. For more information about installation and adding language support, see Tesseract’s README.