(X)HTML Markup Sanitizer

  •        73

The XHTML Markup Sanitizer takes untrusted (X)HTML and massages it into real, trusted XHTML. It's particularly useful with content management systems where users are in control of markup, but you want to target XHTML1.1.

http://markupsanitizer.codeplex.com/

Tags
Implementation
License
Platform

   




Related Projects

perl-HTML-SAX - HTML::SAX - HTML/XHTML parser that outputs SAX events


HTML::SAX - HTML/XHTML parser that outputs SAX events

TagSoup - SAX-compliant parser in Java


TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

Delphi Dom HTML Parser and Converter


DOM core interface, HTML parser, HTML -gt; Unicode converter, HTML -gt; XHTML converter

Nokogiri - HTML, XML, SAX, and Reader parser with XPath and CSS selector support


Nokogiri (?) is an HTML, XML, SAX, DOM parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors, XML/HTML builder, XSLT transformer. Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.

HTML-XHTML-Lite - Release history of HTML-XHTML-Lite


Release history of HTML-XHTML-Lite



HTML-XHTML-DVSM - Release history of HTML-XHTML-DVSM


Release history of HTML-XHTML-DVSM

Neko HTML Parser - simple HTML scanner


NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and fix up many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements. Automatically closes elements with optional end tags and can handle mismatched inline element tags.

TagSoup - HTML/XML parser for Haskell


TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.

Hpricot - HTML parser for Ruby


Hpricot is a fast, flexible HTML parser. Hpricot can be handy for reading broken XML files, since many of the same techniques can be used. If a quote is missing, Hpricot tries to figure it out. If tags overlap, Hpricot works on sorting them out.

break


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <script type="text/javascript" src="raf.js"></script> <script type="text/javascript" src="Paddle.js"></script> <script type="text/javascript" src="ball.js"></script> <script> var paddle; var ball; var brick; var ctx; var canvas; var WIDTH; var HEIGHT; var rightDown = false, leftDown = false; var ballNum = 3;

simple-arrays-in-php


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Untitled Document</title> </head> <body> <?php $coffee = array("starbucks", "figaro", "seattles best"); $coffee_places = array ("a" => "starbucks", "b" => "figaro"); $preferred = $coffee[1]; echo $preferred; $prefer = $coffee_places ["b"]; echo $

multiplication-table-in-PHP


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Untitled Document</title> </head> <body> <?php echo "<h1>Multiplication table</h1>"; echo "<table border=2 width=50%"; for ($i = 1; $i <= 12; $i++ ) { echo "<tr>"; echo "<td>".$i."</td>"; for ( $j = 2; $j <= 12; $j++ ) { echo

computation-loops


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Untitled Document</title> </head> <body> <?php //variable declaration $iteration = 10; $a = 0; $b = 1; for ($i=0 ; $i <= $iteration; $i++) { $sum = $a + $b; $a = $b; $b = $sum; ?> <tr<?php if ($i % 2 != 0 ) echo 'class="alt"' ?>> <td>F<sub><?php ec

xiasteven-Hello


<%@ page contentType="text/html; charset=utf-8" language="java" \terrorPage=""%> <%@ page isELIgnored="false"%> <%@ taglib uri="http://java.sun.com/jsp/jstl/fmt" prefix="fmt"%> <%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> \t<head> \t\t<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> \t\t<meta

perl-HTML-StripScripts-Parser - HTML::StripScripts::Parser - XSS filter using HTML::Parser


HTML::StripScripts::Parser - XSS filter using HTML::Parser

HtmlCleaner - HTML parser in Java


HtmlCleaner is HTML parser written in Java. HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows similar rules that the most of web browsers use in order to create Document Object Model. However, user may provide custom tag and rule set for tag filtering and balancing.

JTidy - HTML parser and pretty printer in Java


JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

Html Agility Pack


This is an HTML parser that builds a read/write DOM from “real world” HTML files. It supports XPATH or XSLT and is tolerant with "real world" malformed HTML.

XHTML - HTML?XHTML?CSS


HTML?XHTML?CSS