word2html - a quick and dirty script to convert a Word (docx) document to html.

  •        429

This will give you a new file, /path/to/MyGloriousDoc.html, that's (hopefully) decent-looking html. While this code is MIT-licensed, it uses boty pypandoc and pytidylib, both of which depend on other software that may not be MIT-licensed and must be installed for this to work.

https://github.com/bradmontgomery/word2html

Tags
Implementation
License
Platform

   




Related Projects

mammoth.js - Convert Word documents (.docx files) to HTML

  •    Javascript

Mammoth is designed to convert .docx documents, such as those created by Microsoft Word, and convert them to HTML. Mammoth aims to produce simple and clean HTML by using semantic information in the document, and ignoring other details. For instance, Mammoth converts any paragraph with the style Heading 1 to h1 elements, rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading. There's a large mismatch between the structure used by .docx and the structure of HTML, meaning that the conversion is unlikely to be perfect for more complicated documents. Mammoth works best if you only use styles to semantically mark up your document.

Pandoc - General Markup Converter

  •    Haskell

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It an convert documents in markdown, reStructuredText, textile, HTML, DocBook, or LaTeX to HTML formats, Word processor formats, PDF and other markup formats.

docx4j - JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files

  •    Java

docx4j is a library which helps you to work with the Office OpenXML file format as used in docx documents, pptx presentations, and xlsx spreadsheets.

ONLYOFFICE Desktop Editors - An office suite that combines text, spreadsheet and presentation editors allowing to create, view and edit local documents

  •    C

ONLYOFFICE Desktop Editors is a free and open source office suite comprises text documents, spreadsheets and presentations allowing to create, view and edit documents of any size and complexity, to easily switch to the online mode for real-time co-editing and collaboration. Features as reviewing, commenting and chat are available as well. Deal with multiple files within one and the same window thanks to the tab-based user interface

word-to-markdown - A ruby gem to liberate content from Microsoft Word documents

  •    Ruby

Our default content publishing workflow is terribly broken. We've all been trained to make paper, yet today, content authored once is more commonly consumed in multiple formats, and rarely, if ever, does it embody physical form. Put another way, our go-to content authoring workflow remains relatively unchanged since it was conceived in the early 80s. I'm asked regularly by government employees — knowledge workers who fire up a desktop word processor as the first step to any project — for an automated pipeline to convert Microsoft Word documents to Markdown, the lingua franca of the internet, but as my recent foray into building just such a converter proves, it's not that simple.


docconv - Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text

  •    Go

A Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain text. Note for returning users: the Go code path for this pkg been moved to code.sajari.com/docconv. Follow the installation instructions to checkout a version of the code in the correct place.

MarkLogic Sample Authoring App for Word

  •    CSharp

The MarkLogic Authoring Sample App for Word lets authors enrich Word documents using Content Controls, associate and manage metadata with those Controls, as well as search and reuse existing Controls and their metadata in new Word documents.

Word document generator using Open Xml 2.0 SDK

  •    

WordDocumentGenerator is an utility to generate Word documents from templates using Visual Studio 2010 and Open XML 2.0 SDK. WordDocumentGenerator helps generate Word documents both non-refresh-able as well as refresh-able based on predefined templates using minimum code chang...

SharePoint Document Converter

  •    

SharePoint Document Converter solution gives a start on how we can leverage the Word automation Service to convert documents to formats that word can support. This project convert documents of type "docx" or "doc" to any possible file type that word support like to PDF, XPS, D...

Word Template Generator

  •    

The Custom Template Generator is a plugin for Microsoft Word, developed in C# The plugin enables you to manage templates for Word documents, generate new documents for you based on the template and automatically fill out some of the values (title, author, etc) for you through...

Word Document Print Manger - Printing Documents in the Background

  •    CSharp

Prints selected Word Documents (or any other textfile) in the background.

Word To SharePoint (Transform Word Documents to MOSS / WSS)

  •    

A SharePoint Feature for easy conversion of Word 2007 documents to Sharepoint/MOSS. The solution also extracts, transfers and re-links images to a selected ImageLibrary, includes styles, tables, etc.

HTML to docx Converter

  •    

This converts HTML into Word documents (docx format). The code is written in PHP and works with PHPWord.

Jasper Reports

  •    Java

JasperReports is the world's most popular open source reporting engine. It is entierly written in Java and it is able to use data coming from any kind of data source and produce pixel-perfect documents that can be viewed, printed or exported in a variety of document formats including HTML, PDF, Excel, OpenOffice and Word.

RTF to HTML Lite Converter

  •    C++

RTF2HTML is a name for a cross-platform C++ library (DLL, OCX) and command-line utility, which is intended to convert documents from Rich Text Format (e.g. Word, OO Writer) to HTML. Its features are tiny size, speed, low mem usage and compact output.

PHPWord - A pure PHP library for reading and writing word processing documents

  •    PHP

PHPWord is a library written in pure PHP that provides a set of classes to write to and read from different document file formats. The current version of PHPWord supports Microsoft Office Open XML (OOXML or OpenXML), OASIS Open Document Format for Office Applications (OpenDocument or ODF), Rich Text Format (RTF), HTML, and PDF. PHPWord is an open source project licensed under the terms of LGPL version 3. PHPWord is aimed to be a high quality software product by incorporating continuous integration and unit testing. You can learn more about PHPWord by reading the Developers' Documentation.

gotenberg - :scroll: A stateless API for converting Markdown files, HTML files and Office documents to PDF

  •    Go

At TheCodingMachine, we build a lot of web applications (intranets, extranets and so on) which require to generate PDF from various sources. Each time, we ended up using some well known libraries like wkhtmltopdf or unoconv and kind of lost time by reimplementing a solution from a project to another project. Meh. The API is now available on your host under http://127.0.0.1:3000.

nteditor

  •    

A custom user control to read and edit word documents like doc, rtf, etc. Attempts are being made to read and write or maybe simple convert DOCX and other documents. But current focus is upon word documents specially RTF which has a different version for different softwares.

ghostword

  •    Delphi

GhostWord is an interface for the GhostScript package, which enables you to create PDF documents from Microsoft Word, Excel or PowerPoint documents. GhostWord installs itself in Word, Excel and PowerPoint, and you convert the documents by simply clicking

Paperless-ng - A supercharged version of paperless: scan, index and archive all your physical documents

  •    Python

Paperless is an application by Daniel Quinn and contributors that indexes your scanned documents and allows you to easily search for documents and store metadata alongside your documents. It performs OCR on your documents, adds selectable text to image only documents and adds tags, correspondents and document types to your documents. It supports PDF documents, images, plain text files, and Office documents (Word, Excel, Powerpoint, and LibreOffice equivalents).






We have large collection of open source products. Follow the tags from Tag Cloud >>


Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.