Light HTML to XML converter Umbraco xslt wrapper

  •        0

An xslt extension for use in Umbraco that wraps the functionality found in Light HTML to XML converter by Alain COUTHURES: http://sourceforge.net/projects/light-html2xml/ The extension can help to reformat bad html into xml for getting external content i.e. screen scraping.

http://umbhtml2xml.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

TagSoup - HTML/XML parser for Haskell


TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.

Nokogiri - HTML, XML, SAX, and Reader parser with XPath and CSS selector support


Nokogiri (?) is an HTML, XML, SAX, DOM parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors, XML/HTML builder, XSLT transformer. Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.

TagSoup - SAX-compliant parser in Java


TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

Hpricot - HTML parser for Ruby


Hpricot is a fast, flexible HTML parser. Hpricot can be handy for reading broken XML files, since many of the same techniques can be used. If a quote is missing, Hpricot tries to figure it out. If tags overlap, Hpricot works on sorting them out.

Beautiful Soup - Python HTML/XML parser


Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Beautiful Soup parses anything you give it, and does the tree traversal stuff for you. You can tell it "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text."

Neko HTML Parser - simple HTML scanner


NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and fix up many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements. Automatically closes elements with optional end tags and can handle mismatched inline element tags.

Arbica


Arabica is an XML and HTML processing toolkit, providing SAX, DOM, XPath, and partial XSLT implementations, written in Standard C++.

HtmlCleaner - HTML parser in Java


HtmlCleaner is HTML parser written in Java. HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows similar rules that the most of web browsers use in order to create Document Object Model. However, user may provide custom tag and rule set for tag filtering and balancing.

Dynamic Grid Data Type for Umbraco


The Dynamic Grid Data Type for Umbraco is a custom ASCX/C# control that was created to store tabular data as an Umbraco "Data Type". There's an ability to add/remove rows/columns and it writes the whole grid to the database as an XML string. All done via UpdatePanels.

perl-XML-SAX-ExpatXS - XML::SAX::ExpatXS - Perl SAX 2 XS extension to Expat parser


XML::SAX::ExpatXS - Perl SAX 2 XS extension to Expat parser

perl-XML-DOM - DOM extension to XML::Parser


DOM extension to XML::Parser

System.Html


A html parser that turns badly formatted html into XPath query able xml. Similar to html tidy and html agility pack; I suppose you can call it "Just Another Html Parser". Written in c# and does not require anything that isn't found in the dot net framework.

Contact Form for Umbraco


Contact Form for Umbraco is an extension for Umbraco (surprise!). It's supposed to be just like Umbraco: Simple, flexible and friendly.

Doodle for Umbraco


This is an Umbraco extension based on the www.doodle.ch application. One of the main goals with this application is to create an Umbraco extension that you can modify and style into your organizations design model without editing any code. Special thanks to: Tobias Svane & ...

uHelpsy - Umbraco Helper Library


uHelpsy is a tiny (but growing) library which makes programatically interacting with Umbraco 4.5.2 much more pleasant. It provides helper methods for creating and updating nodes, working with the Umbraco cache, and dealing with unpublished nodes.

.net HTTP Data Extractor


The nWeb Data Extractor Library provides support for extracting data from the http response html, it allows user to convert http response HTML to XML, then allows user to extract desired data form the generated xml file.

well_formed_html - A simple coffeescript parser to check for well-formed html/xml


A simple coffeescript parser to check for well-formed html/xml

GDataXML-HTML


HTML/XML parser for iOS and OSX, based on Google's GDataXML. It implements parts of NSXML so it's easy to parse XML Files with a DOM API or XPath. This fork of the original GDataXML adds support for the HTMLparser module of libxml2 and allows you to deal with non validating XML or HTML

Commerce for Umbraco


Commerce for Umbraco is an ecommerce extension that runs entirely within an Umbraco instance. It is easliy extended to accomodate nearly any ecommerce scenario.