Mammoth is designed to convert .docx documents, such as those created by Microsoft Word, and convert them to HTML. Mammoth aims to produce simple and clean HTML by using semantic information in the document, and ignoring other details. For instance, Mammoth converts any paragraph with the style Heading 1 to h1 elements, rather than attempting to exactly copy the styling (font, text size, colour, etc.) of the heading. There's a large mismatch between the structure used by .docx and the structure of HTML, meaning that the conversion is unlikely to be perfect for more complicated documents. Mammoth works best if you only use styles to semantically mark up your document.
https://github.com/mwilliamson/mammoth.jsTags | docx html office word markdown md |
Implementation | Javascript |
License | Public |
Platform | OS-Independent |
Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. Pandoc can read Markdown, CommonMark, PHP Markdown Extra, GitHub-Flavored Markdown, MultiMarkdown, and (subsets of) Textile, reStructuredText, HTML, LaTeX, MediaWiki markup, TWiki markup, TikiWiki markup, Creole 1.0, Haddock markup, OPML, Emacs Org mode, DocBook, JATS, Muse, txt2tags, Vimwiki, EPUB, ODT, and Word docx.
document-conversion markup markup-converter text-extractiondocx4j is a library which helps you to work with the Office OpenXML file format as used in docx documents, pptx presentations, and xlsx spreadsheets.
document-processing document-conversion text-extraction microsoft-documentsONLYOFFICE Desktop Editors is a free and open source office suite comprises text documents, spreadsheets and presentations allowing to create, view and edit documents of any size and complexity, to easily switch to the online mode for real-time co-editing and collaboration. Features as reviewing, commenting and chat are available as well. Deal with multiple files within one and the same window thanks to the tab-based user interface
onlyoffice office word excel spreadsheet presentation desktop docx xlsx pptx doc xls ppt odt ods odp collaboration node nodejs office-suite document-editorDocX is a .NET library that allows developers to manipulate Word 2007/2010/2013 files, in an easy and intuitive manner. DocX is fast, lightweight and best of all it does not require Microsoft Word or Office to be installed.NOTE: There is a new Master branch as of Oct. 3, 2017. Please read about the Classic branch if you were using this project before the change.
docx office microsoft-word c-sharpPandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It an convert documents in markdown, reStructuredText, textile, HTML, DocBook, or LaTeX to HTML formats, Word processor formats, PDF and other markup formats.
text-extraction document-conversion document markup text-to-pdfAcademic Markdown is a Python module for generating .md, .html, .pdf, .docx, and .odt files from Markdown source. Pandoc is used for most of the heavy lifting, so refer to the Pandoc website for detailed information about writing in Pandoc Markdown. However, Academic Markdown offers some additional functionality that is useful for writing scientific documents, such as integration with Zotero references, and a number of useful Academic Markdown extensions. At present, the main target for Academic Markdown is the OpenSesame documentation site, http://osdoc.cogsci.nl/, although it may in time grow into a more comprehensive and user-friendly tool.
PHPWord is a library written in pure PHP that provides a set of classes to write to and read from different document file formats. The current version of PHPWord supports Microsoft Office Open XML (OOXML or OpenXML), OASIS Open Document Format for Office Applications (OpenDocument or ODF), Rich Text Format (RTF), HTML, and PDF. PHPWord is an open source project licensed under the terms of LGPL version 3. PHPWord is aimed to be a high quality software product by incorporating continuous integration and unit testing. You can learn more about PHPWord by reading the Developers' Documentation.
libreoffice-writer msword doc docx html odt pdf rtfDocX is a .NET library written in C# which allows a developer to manipulate Word 2007 files in an easy and intuitive way.
docx office word automationSimple OOXML makes the creation of Open Office XML documents easier for developers. Modify or create any .docx or .xlsx document without Microsoft Word or Microsoft Excel. Uses the Open Office SDK v 2.0.
docx excel excelpackage office-open-xml ooxml openofficeJoeffice is the first open source office suite written in Java. Its features include Docking system. Visualize several documents in the same window, It can have a lot of documents open at the same time and easily switch from one to another. It works with Microsoft document formats (docx, xslx, pptx). It can get data through web services (RMI, SOAP, REST).
office office-suite word excel powerpointA Go wrapper library to convert PDF, DOC, DOCX, XML, HTML, RTF, ODT, Pages documents and images (see optional dependencies below) to plain text. Note for returning users: the Go code path for this pkg been moved to code.sajari.com/docconv. Follow the installation instructions to checkout a version of the code in the correct place.
rtf docx xml html rtf-files docs conversion pdf pdf-converter wordAt TheCodingMachine, we build a lot of web applications (intranets, extranets and so on) which require to generate PDF from various sources. Each time, we ended up using some well known libraries like wkhtmltopdf or unoconv and kind of lost time by reimplementing a solution from a project to another project. Meh. The API is now available on your host under http://127.0.0.1:3000.
api stateless pdf pdf-converter word excel powerpoint html wkhtmltopdf unoconv docker markdownFor Word-to-Markdown scripts, first navigate to this directory, using cd doc-to-md. Run './accept.sh' to generate new markdown, which you can compare to the original markdown using git.
Phishery is a Simple SSL Enabled HTTP server with the primary purpose of phishing credentials via Basic Authentication. Phishery also provides the ability easily to inject the URL into a .docx Word document. The power of phishery is best demonstrated by setting a Word document's template to a phishery URL. This causes Microsoft Word to make a request to the URL, resulting in an Authentication Dialog being shown to the end-user. The ability to inject any .docx file with a URL is possible using phishery's -i [in docx], -o [out docx], and -u [url] options.
A SharePoint Feature for easy conversion of Word 2007 documents to Sharepoint/MOSS. The solution also extracts, transfers and re-links images to a selected ImageLibrary, includes styles, tables, etc.
sharepoint docx microsoft-office moss transfer transformation translationCreate polished résumés and CVs in multiple formats from your command line or shell. Author in clean Markdown and JSON, export to Word, HTML, PDF, LaTeX, plain text, and other arbitrary formats. Fight the power, save trees. Compatible with FRESH and JRS resumes. HackMyResume is built with Node.js and runs on recent versions of OS X, Linux, or Windows. View the FAQ.
resume cv portfolio employment career markdown json word pdf yaml html latex cli handlebars underscore templateDOTX to DOCX Converter converts Office Open XML templates (DOTX/DOTM) to Office Open XML documents (DOCX/DOCM). The program is an effective supplement to the Microsoft Office Compatibility Pack, which cannot convert these files.
Microsoft Office SharePoint Server (MOSS) Document Converters with Word & Excel 2007 on the server. Converting Office 2003 file-types (doc, xls) to pdf and xps. Could easily be altered for work for docx and xlsx file-types. Desktop Automation on the Server: Previously, us...
sharepoint document-converter document-conversion excel-to-pdfAnnouncement (2019/04/29): UniDoc aquires gooxml. UniDoc (https://unidoc.io and https://github.com/unidoc) has aquired gooxml from Baliance and we plan to add it to our suite of document format support for Go. The repository (gooxml) will be moving to a new home: https://github.com/unidoc/unioffice and the package name will be come unioffice.
ooxml openoffice docx pptx xlsx ecma-376 spreadsheet word powerpoint excelThis converts HTML into Word documents (docx format). The code is written in PHP and works with PHPWord.
We have large collection of open source products. Follow the tags from
Tag Cloud >>
Open source products are scattered around the web. Please provide information
about the open source projects you own / you use.
Add Projects.