sourcecode XML metadata extraction tools

  •        0

Tools for extracting and transforming XML-like mark-up, embedded in source code comments, into proper external entities or well-formed XML files. Can be used for JavaDoc-like quot;literate programmingquot;, or embedding other build-related or CM metadata.



Related Projects

.net HTTP Data Extractor

The nWeb Data Extractor Library provides support for extracting data from the http response html, it allows user to convert http response HTML to XML, then allows user to extract desired data form the generated xml file.

Content-extractor - Easy way to extract data from web pages

Content Extractor is professional data-mining software that organizes collected information for a convenient work. You can use it for a regular automatic data collection or extraction of any web content manually. The program is very accurate and collects data from pages associated with the specified source. It comes with a handy built-in browser and can save data in MS Excel .xml, .html and .csv formats. Please read Content Extractor Quickstart to start working with Content Extractor. If you wan

Shawty - XPath-based Text Extractor

Welcome to ShawtyShawty is an XPath-based text extractor written in Groovy. The idea is to extract a Map/Table of fields from any marked-up source, like XML, SGML, HTML, etc. BuildingYou'll need Groovy 1.6+, Java 1.5+, and Buildr 1.4+. Once you've downloaded and installed all that stuff, running: $ buildr test package ...will produce shawty-{version}.jar in the "target" sub-directory of the Shawty tree. Example UsageSee $SHAWTY_DIR/src/test/groovy/com/google/shawty/XPathExtractorTests.groovy for

Wikipedianerdata - Training data generator for Named Entity Recognition, using Wikipedia

This project contains several apart programs: splitter: capable of splitting large Wikipedia xml files into smaller files of a specified size parser: removes wikipedia markup (to a certain extent) and saves data into a specified schema entityExtractor: identifies and tags named entities, marking them and saving them in a database.

Al-ole - OLE compound file extractor & packager

with AL-OLE, you can extract all streams as separate files with a structure xml file from a OLE compound file(doc,xls,ppt...) modify extracted files with any tools re-package modified streams to a compound file

Gml-extractor - Open Source General Markup Language Extractor for Java

GML-Extractor can extract any data from XML, SGML or HTML with little coding work. If you want to extract some data from XML, SGML or HTML, but do not want build a tree in memory as DOM4j, and do not want to write so much code as SAX. This tool will help you! SampleXML ExampleHere is an example, like down-below:(example.xml) This XML contains three papers, each paper has properties as id, title, author and abstract. With this tool, you can exctract them one by one. <?xml version="1.0" encoding="


The aim of MIEX (Metadata and Information Extractor from small XML documents) is to create a wrapper for the Stanford Parser, to extract and store metadata (syntactic structures, relationships among words...) from simple XML documents.

Pediasuckr - Wikipedia Information Extractor

Tries to extract useful information, or hopefully knowledge, from database-dumps of Wikipedia. Currently reads from an uncompressed database-dump in XML format. The parser understands wiki-links, html attributes, http links and wikipedia category markup. It is written in Groovy for now, I might just start using Java 1.6 and the scripting support in there since groovy is very slow for some tasks. The information will be stored in a triple store of some kind, most likely a relational database. May

Freshdox - Lets you ensure code samples in documentation are up-to-date

The primary purpose of this project is to ensure code samples in documentation are up-to-date. This project is typically only useful for software libraries, rather than stand alone applications. For instance, the Jakarta Commons components documentation contains code samples. By marking these samples up in simple XML tags, freshdox can be run against your documentation to ensure the code samples are working correctly. Tests can be embedded in documents using XML tags, with a default style sheet

Rejuvenate-pc - Rejuvenate Pointcut: A Tool for Pointcut Expression Recovery in Evolving Aspect-Orie

Rejuvenate Pointcut is an open source, research prototype Eclipse plugin that limits the problems associated with fragile pointcuts in AspectJ by assisting developers in rejuvenating pointcuts as the base-code evolves. Development of the tool is current in its early stages and we welcome interested participants to join in the development effort by contacting the initial author. It is built as an extension to the AspectJ Development Tools (AJDT) and leverages the JayFX fact extractor plugin, the