docx_converter - This Ruby Gem converts Word docx files into html or LaTeX via the kramdown syntax

  •        743

This Ruby library (gem) parses and translates .docx Word documents into kramdown syntax, which allows for easy subsequent translation into html or TeX code via the excellent kramdown library. kramdown is a superset of Markdown. See for more details. A .docx file as written by modern versions of Microsoft Office is just a .zip file in disguise. It contains a directory tree containing XML files. Parsing of these compressed XML trees is rather staightforward, thanks to the zip and nokogiri Ruby libraries.



Related Projects

Pandoc - Universal markup converter

  •    Haskell

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. Pandoc can read Markdown, CommonMark, PHP Markdown Extra, GitHub-Flavored Markdown, MultiMarkdown, and (subsets of) Textile, reStructuredText, HTML, LaTeX, MediaWiki markup, TWiki markup, TikiWiki markup, Creole 1.0, Haddock markup, OPML, Emacs Org mode, DocBook, JATS, Muse, txt2tags, Vimwiki, EPUB, ODT, and Word docx.

docx4j - JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files

  •    Java

docx4j is a library which helps you to work with the Office OpenXML file format as used in docx documents, pptx presentations, and xlsx spreadsheets.

DOTX to DOCX Converter

  •    CSharp

DOTX to DOCX Converter converts Office Open XML templates (DOTX/DOTM) to Office Open XML documents (DOCX/DOCM). The program is an effective supplement to the Microsoft Office Compatibility Pack, which cannot convert these files.

kramdown - kramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions

  •    Ruby

kramdown was originally licensed under the GPL until the 1.0.0 release. However, due to the many requests it is now released under the MIT license and therefore can easily be used in commercial projects, too. kramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions.

mammoth.js - Convert Word documents (.docx files) to HTML

  •    Javascript

Mammoth is designed to convert .docx documents, such as those created by Microsoft Word, and convert them to HTML. Mammoth aims to produce simple and clean HTML by using semantic information in the document, and ignoring other details. For instance, Mammoth converts any paragraph with the style Heading 1 to h1 elements, rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading. There's a large mismatch between the structure used by .docx and the structure of HTML, meaning that the conversion is unlikely to be perfect for more complicated documents. Mammoth works best if you only use styles to semantically mark up your document.

docconv - Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text

  •    Go

A Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain text. Note for returning users: the Go code path for this pkg been moved to Follow the installation instructions to checkout a version of the code in the correct place.

Pandoc - General Markup Converter

  •    Haskell

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It an convert documents in markdown, reStructuredText, textile, HTML, DocBook, or LaTeX to HTML formats, Word processor formats, PDF and other markup formats.

phishery - An SSL Enabled Basic Auth Credential Harvester with a Word Document Template URL Injector

  •    Go

Phishery is a Simple SSL Enabled HTTP server with the primary purpose of phishing credentials via Basic Authentication. Phishery also provides the ability easily to inject the URL into a .docx Word document. The power of phishery is best demonstrated by setting a Word document's template to a phishery URL. This causes Microsoft Word to make a request to the URL, resulting in an Authentication Dialog being shown to the end-user. The ability to inject any .docx file with a URL is possible using phishery's -i [in docx], -o [out docx], and -u [url] options.

DocX - Fast and easy to use

  •    CSharp

DocX is a .NET library that allows developers to manipulate Word 2007/2010/2013 files, in an easy and intuitive manner. DocX is fast, lightweight and best of all it does not require Microsoft Word or Office to be installed.NOTE: There is a new Master branch as of Oct. 3, 2017. Please read about the Classic branch if you were using this project before the change.

HTML to docx Converter


This converts HTML into Word documents (docx format). The code is written in PHP and works with PHPWord.

Marker - Markdown editor for linux made with GTK+-3.0

  •    C++

Marker is a markdown editor for linux made with GTK+-3.0. It provides support to view and edit markdown documents. It supports TeX math rendering with KaTeX or MathJax. It also supports Mermaid diagrams, Charter for plotting, Syntax highlighting for code blocks with highlight.js, Integrated sketch editor, Flexible export options to PDF, RTF, ODT, DOCX.

SharePoint Document Converter


SharePoint Document Converter solution gives a start on how we can leverage the Word automation Service to convert documents to formats that word can support. This project convert documents of type "docx" or "doc" to any possible file type that word support like to PDF, XPS, D...

sablon - Ruby Document Template Processor based on docx templates and Mail Merge fields.

  •    Ruby

Is a document template processor for Word docx files. It leverages Word's built-in formatting and layouting capabilities to make template creation easy and efficient. Sablon templates are normal Word documents (.docx) sprinkled with MailMerge fields to perform operations. The following section uses the notation «=title» to refer to Word MailMerge fields.

python-docx - Create and modify Word documents with Python

  •    Python

python-docx is a Python library for creating and updating Microsoft Word (.docx) files. More information is available in the python-docx documentation.

Docx to Any

  •    Java

Get different varieties of output from a docx! It uses xsl which can be customized by the user to get whatever kind of output is expected. By default there will be a default xsl with which docx to text can be done.

academicmarkdown - Academic writing with Markdown

  •    Python

Academic Markdown is a Python module for generating .md, .html, .pdf, .docx, and .odt files from Markdown source. Pandoc is used for most of the heavy lifting, so refer to the Pandoc website for detailed information about writing in Pandoc Markdown. However, Academic Markdown offers some additional functionality that is useful for writing scientific documents, such as integration with Zotero references, and a number of useful Academic Markdown extensions. At present, the main target for Academic Markdown is the OpenSesame documentation site,, although it may in time grow into a more comprehensive and user-friendly tool.


  •    DotNet

DocX is a .NET library written in C# which allows a developer to manipulate Word 2007 files in an easy and intuitive way.

docx2md - Convert Microsoft Word Document to Markdown

  •    Go

Convert Microsoft Word Document to Markdown

MOSS Document Converter


Microsoft Office SharePoint Server (MOSS) Document Converters with Word & Excel 2007 on the server. Converting Office 2003 file-types (doc, xls) to pdf and xps. Could easily be altered for work for docx and xlsx file-types. Desktop Automation on the Server: Previously, us...

python-docx - Reads, queries and modifies Microsoft Word 2007/2008 docx files.

  •    Python

Reads, queries and modifies Microsoft Word 2007/2008 docx files.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.