docx2tex - Converts Microsoft Word docx to LaTeX

  •        1488

…or get source via Git. Please note that you have to add the --recursive option in order to clone docx2hub with submodules. You can specify a custom configuration file for docx2tex. There are two different formats to write a configuration.



Related Projects

PHPWord - A pure PHP library for reading and writing word processing documents

  •    PHP

PHPWord is a library written in pure PHP that provides a set of classes to write to and read from different document file formats. The current version of PHPWord supports Microsoft Office Open XML (OOXML or OpenXML), OASIS Open Document Format for Office Applications (OpenDocument or ODF), Rich Text Format (RTF), HTML, and PDF. PHPWord is an open source project licensed under the terms of LGPL version 3. PHPWord is aimed to be a high quality software product by incorporating continuous integration and unit testing. You can learn more about PHPWord by reading the Developers' Documentation.

Simple OOXML


Simple OOXML makes the creation of Open Office XML documents easier for developers. Modify or create any .docx or .xlsx document without Microsoft Word or Microsoft Excel. Uses the Open Office SDK v 2.0.

LaTeX to RTF converter

  •    C

LaTeX to RTF convertor that handles equations, figures, and cross-refe

Pandoc - Universal markup converter

  •    Haskell

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. Pandoc can read Markdown, CommonMark, PHP Markdown Extra, GitHub-Flavored Markdown, MultiMarkdown, and (subsets of) Textile, reStructuredText, HTML, LaTeX, MediaWiki markup, TWiki markup, TikiWiki markup, Creole 1.0, Haddock markup, OPML, Emacs Org mode, DocBook, JATS, Muse, txt2tags, Vimwiki, EPUB, ODT, and Word docx.

Pandoc - General Markup Converter

  •    Haskell

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It an convert documents in markdown, reStructuredText, textile, HTML, DocBook, or LaTeX to HTML formats, Word processor formats, PDF and other markup formats.

docx2tex: Word 2007 to TeX

  •    WPF

Docx2tex is a small command line tool that uses standard technologies to help users of Word 2007 to publish publications where typography is relevant or only papers produced by TeX are accepted. Behind the scenes, docx2tex uses common technologies to interpret Word 2007 OOXML ...

Office Open XML for C++

  •    C++

Office Open XML for C++ is a project to create an API for working with OOXML documents such as docx, pptx, xlsx and any other files that conform to the Open Packaging Conventions in c++

pandoc-ruby - Ruby wrapper for Pandoc

  •    Ruby

PandocRuby is a wrapper for Pandoc, a Haskell library with command line tools for converting one markup format to another. Pandoc can convert documents from a variety of formats including markdown, reStructuredText, textile, HTML, DocBook, LaTeX, and MediaWiki markup to a variety of other formats, including markdown, reStructuredText, HTML, LaTeX, ConTeXt, PDF, RTF, DocBook XML, OpenDocument XML, ODT, GNU Texinfo, MediaWiki markup, groff man pages, HTML slide shows, EPUB, Microsoft Word docx, and more.

OOXML Validator


Validates all types of OOXML-packages (docx, xlsx, pptx, etc) and presents clear information on the errors.

Standalone tool for extracting embedded Office Open XML objects from files

  •    CSharp

This tool is a standalone tool that will allow users to extract all embedded Open XML Format objects from a document as standalone Open XML Format files that can be opened and edited. With the release of Office 2007 came the introduction of new default file formats, called Of...

unioffice - Pure go library for creating and processing Office Word (

  •    Go

Announcement (2019/04/29): UniDoc aquires gooxml. UniDoc ( and has aquired gooxml from Baliance and we plan to add it to our suite of document format support for Go. The repository (gooxml) will be moving to a new home: and the package name will be come unioffice.

poi - Mirror of Apache POI

  •    Java

A Java library for reading and writing Microsoft Office binary and OOXML file formats.The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). In short, you can read and write MS Excel files using Java. In addition, you can read and write MS Word and MS PowerPoint files using Java. Apache POI is your Java Excel solution (for Excel 97-2008). We have a complete API for porting other OOXML and OLE2 formats and welcome others to participate.

DOTX to DOCX Converter

  •    CSharp

DOTX to DOCX Converter converts Office Open XML templates (DOTX/DOTM) to Office Open XML documents (DOCX/DOCM). The program is an effective supplement to the Microsoft Office Compatibility Pack, which cannot convert these files.

DocX - Fast and easy to use

  •    CSharp

DocX is a .NET library that allows developers to manipulate Word 2007/2010/2013 files, in an easy and intuitive manner. DocX is fast, lightweight and best of all it does not require Microsoft Word or Office to be installed.NOTE: There is a new Master branch as of Oct. 3, 2017. Please read about the Classic branch if you were using this project before the change.

ONLYOFFICE Desktop Editors - An office suite that combines text, spreadsheet and presentation editors allowing to create, view and edit local documents

  •    C

ONLYOFFICE Desktop Editors is a free and open source office suite comprises text documents, spreadsheets and presentations allowing to create, view and edit documents of any size and complexity, to easily switch to the online mode for real-time co-editing and collaboration. Features as reviewing, commenting and chat are available as well. Deal with multiple files within one and the same window thanks to the tab-based user interface

MarkLogic Toolkit for PowerPoint

  •    CSharp

The MarkLogic Toolkit for PowerPoint allows you to quickly build content applications with MarkLogic Server that extend the functionality of Microsoft PowerPoint and leverage Open XML (OOXML). Includes PowerPoint Addin with supporting C#, Javascript, and XQuery APIs

MarkLogic Toolkit for Word

  •    CSharp

The MarkLogic Toolkit for Word allows you to quickly build content applications with MarkLogic Server that extend the functionality of Microsoft Word and leverage Open XML (OOXML). Includes Word Addin with supporting C#, JavaScript, and XQuery APIs

DocBook to LaTeX Publishing

  •    Python

DocBook to LaTeX Publishing transforms your SGML/XML DocBook documents to DVI, PostScript or PDF by translating them in pure LaTeX as a first process. MathML 2.0 markups are supported too. It started as a clone of DB2LaTeX.

docx4j - JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files

  •    Java

docx4j is a library which helps you to work with the Office OpenXML file format as used in docx documents, pptx presentations, and xlsx spreadsheets.

mammoth.js - Convert Word documents (.docx files) to HTML

  •    Javascript

Mammoth is designed to convert .docx documents, such as those created by Microsoft Word, and convert them to HTML. Mammoth aims to produce simple and clean HTML by using semantic information in the document, and ignoring other details. For instance, Mammoth converts any paragraph with the style Heading 1 to h1 elements, rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading. There's a large mismatch between the structure used by .docx and the structure of HTML, meaning that the conversion is unlikely to be perfect for more complicated documents. Mammoth works best if you only use styles to semantically mark up your document.