Pandoc - General Markup Converter

  •        3950

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It an convert documents in markdown, reStructuredText, textile, HTML, DocBook, or LaTeX to HTML formats, Word processor formats, PDF and other markup formats.

It can read [markdown] and (subsets of) [Textile], [reStructuredText], [HTML], [LaTeX], [MediaWiki markup], and [DocBook XML]; and it can write plain text, [markdown], [reStructuredText], [XHTML], [HTML 5], [LaTeX] (including [beamer] slide shows), [ConTeXt], [RTF], [DocBook XML], [OpenDocument XML], [ODT], [Word docx], [GNU Texinfo], [MediaWiki markup], [EPUB] (v2 or v3), [FictionBook2], [Textile], [groff man] pages, [Emacs Org-Mode], [AsciiDoc], and [Slidy], [Slideous], [DZSlides], or [S5] HTML slide shows. It can also produce [PDF] output on systems where LaTeX is installed.



Related Projects

Ghostscript - Document Rendering and Conversion

Ghostscript is a rendering and conversion engine for page description languages, including Postscript and PDF. It has ability to convert PostScript language files to many raster formats, view them on displays, and print them on printers that don't have PostScript language capability built in.

Tikka - A content analysis toolkit

Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

documents4j - Java library for converting documents into another document format

documents4j is a Java library for converting documents into another document format. This is achieved by delegating the conversion to any native application which understands the conversion of the given file into the desired target format.

docx4j - JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files

docx4j is a library which helps you to work with the Office OpenXML file format as used in docx documents, pptx presentations, and xlsx spreadsheets.

jPod - PDF manipulating and rendering framework

jPod is a PDF manipulating and rendering framework. It provides functionality to read, verify the document against the PDF specification. It also provides content stream and rendering framework. It could able to create new document and do incremental updates.

TCPDF - PHP class for generating PDF

TCPDF is a PHP class for generating PDF documents without requiring external extensions. TCPDF Supports UTF-8, Unicode, RTL languages, XHTML, Javascript, digital signatures, barcodes and much more.

PDFBox - Java PDF library

Apache PDFBox is an open source Java PDF library for working with PDF documents. This library allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. It provides support for adding bookmarks, fonts, text extraction, Encryption, PDF printing and lot more.

JODConverter - Automates document conversions using OpenOffice

JODConverter automates conversions between office document formats using or LibreOffice. Supported formats include OpenDocument, PDF, RTF, HTML, Word, Excel, PowerPoint, and Flash. It can be used as a Java library, a command line tool, or a web application.

iText - Java PDF library

iText is one of the popular and widely used PDF library. It is used to generate PDF documents dynamically. Mostly web developers will love it to generate PDF documents and reports based on data from an XML file or a database and serves it to the browser. It has support of adding bookmarks, watermarks, Encryption, Form filling and lot more.

PDF Library - PDF manipulation in .NET

A library for PDF manipulation implementing Adobe PDF standard version 1.7. This library allows to read PDF files and apply changes to them, it is written in .NET 2.0 using Visual Studio 2005. Writing and Parsing PDF is supported.

Apache POI - Java API To Access Microsoft Document File Formats

APIs for manipulating various file formats based upon Open Office XML (ECMA-376) and Microsoft's OLE 2 Compound Document formats using pure Java. Apache POI is your Java Excel, Word and PowerPoint solution. We have a complete API for porting other OOXML and OLE 2 Compound Document formats and welcome others to participate.

LaTeX Helper

GUI to help create a LaTeX document

OpenPipe - Document Pipeline

OpenPipe is an open source scalable platform for manipulating a stream of documents. A pipeline is an ordered set of steps / operations performed on a document to convert from its raw form to something ready to be put into the index. The operations performed on documents include language detection, field manipulation, POS tagging, entity extraction or submitting the document to a search engine.

SWFTools - Utilities to work with Adobe Flash files

SWFTools is a collection of utilities for working with Adobe Flash files (SWF files). The tool collection includes programs for reading SWF files, combining them, and creating them from other content (like images, sound files, videos or sourcecode).

Solr - Blazing-fast, open source enterprise search platform

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.

Aperture - Java framework for getting data and metadata

Aperture is a Java framework for extracting and querying full-text content and metadata from various information systems. It could crawl and extract information from File system, Websites, Mail boxes and Mail servers. It supports various file formats like Office, PDF, Zip and lot more. Metadata information is extracted from image files. Aperture has a strong focus on semantics, metadata extracted could be mapped to predefined properties.

doorstop - Requirements management using version control.

- talks: [GRDevDay](, [BarCamp]( sample: [Generated HTML]( documentation: [API](, [Demo]( Started===============Requirements------------* Python 3.3+* A version

tif - Text Interchange Formats

This package describes and validates formats for storing common object arising in text analysis as native R objects. Representations of a text corpus, document term matrix, and tokenized text are included. The tokenized text format is extensible to include other annotations. There are two versions of the corpus and tokens objects; packages should accept both and return or coerce to at least one of these.corpus (data frame) - A valid corpus data frame object is a data frame with at least two columns. The first column is called doc_id and is a character vector with UTF-8 encoding. Document ids must be unique. The second column is called text and must also be a character vector in UTF-8 encoding. Each individual document is represented by a single row in the data frame. Addition document-level metadata columns and corpus level attributes are allowed but not required.

jandoc - A wrapper for the pandoc document conversion tool with more options.

A wrapper for the pandoc document conversion tool with more options.

meteor-unimark - Universal text document markup syntax

Universal text document markup syntax