Ganon - Fast (HTML DOM) parser written in PHP

  •        2042

GanonThe Ganon library gives access to HTML/XML documents in a very simple object oriented way. It eases modifying the DOM and makes finding elements easy with CSS3-like queries. Ganon is: A universal tokenizer A HTML/XML/RSS DOM Parser Ability to manipulate elements and their attributes Supports invalid HTML Supports UTF8 Can perform advanced CSS3-like queries on elements (like jQuery -- namespaces supported) A HTML beautifier (like HTML Tidy) Minify CSS and Javascript Sort attributes, change character case, correct indentation, etc. Extensible Parsing documents using callbacks based on current character/token Operations separated in smaller functions for easy overriding Fast Easy Ganon is designed for and written in PHP5, but there is also a PHP4 version available. All code in the repository is designed for PHP5 and a simple converter is used to make it compatible with PHP4. Although PHP4 isn't officially supported, a lot of features will still work with it. New bug reports for PHP4 will be taken into consideration and might be fixed if it doesn't require an overhaul of the current model. Learn more about using Ganon with PHP4. NOTE: Ganon is written in PHP version 5.3.1, if you are using a previous version of PHP5 and experience problems, please try the PHP4 version. Quick startFirst off, you need to download the latest version of Ganon. include('path/ganon.php'); // Parse the google code website into a DOM $html = file_get_dom('http://code.google.com/');After including Ganon and loading the DOM, it is time to get started. AccessAccessing elements is made easy through the CSS3-like selectors and the object model. // Find all the paragraph tags with a class attribute and print the // value of the class attribute foreach($html('p[class]') as $element) { echo $element->class, "
\"; } // Find the first div with ID "gc-header" and print the plain text of // the parent element (plain text means no HTML tags, just the text) echo $html('div#gc-header', 0)->parent->getPlainText(); // Find out how many tags there are which are "ns:tag" or "div", but not // "a" and do not have a class attribute echo count($html('(ns|tag, div + !a)[!class]');Learn more about accessing elements. ModificationElements can be easily modified after you've found them. // Find all paragraph tags which are nested inside a div tag, change // their ID attribute and print the new HTML code foreach($html('div p') as $index => $element) { $element->id = "id$index"; } echo $html; // Center all the links inside a document which start with "http://" // and print out the new HTML foreach($html('a[href ^= "http://"]') as $element) { $element->wrap('center'); } echo $html; // Find all odd indexed "td" elements and change the HTML to make them links foreach($html('table td:odd') as $element) { $element->setInnerText(''.$element->getPlainText().''); } echo $html;Learn more about modifying elements. BeautifyGanon can also help you beautify your code and format it properly. // Beautify the old HTML code and print out the new, formatted code dom_format($html, array('attributes_case' => CASE_LOWER)); echo $html;Learn more about beautifying your HTML. Related ProjectsPHP Simple HTML DOM - This project started because Simple HTML DOM didn't perform quite well for me with complex HTML. phpQuery - This project started because phpQuery wasn't fast enough for me. SimpleXML - PHP extension with similar functionality for XML only.

http://code.google.com/p/ganon

Tags
Implementation
License
Platform

   




Related Projects

raspBerry+


raspBerry+ is a web-based administration platform for Blackberry Enterprise Server for MS Exchange (BES). You can group-based activate/kill/delete/add and get status of users, their handhelds and services. With a little download-area and a comment-system

RASP


RASP's A Sneakernet Proxy; download using a thumbdrive.

RasmusDSP


RasmusDSP is an embeddable Audio/MIDI processor. It contains various filters and generators (including SoundFont 2.0 compatible synthesizer). Has a script interpreter which is used to describe instruments, route Audio/MIDI signal between processor units.

Rasea


An acronym for cRoss-plAtform accesS control for Enterprise Applications. Rasea aims to become a reference in access control as a service based on the RBAC model.

Rascal


Rascal, the Advanced Scientific CALculator, is a platform independent modular calculator. Based on modules for integer, doubles, strings, vectors and matrices it can be easily extended with existing C or C++ code.



Rars


RARS is the Robot Auto Racing Simulation, in which the drivers are robot programs. It is intended as a competition among programmers. It consists of a simulation of the physics of cars, a graphic display of the race, and a robot driver for each car.

RARPlayer


This small program allows you to play a video directly from a RAR file and do so in real-time. Both VLC and MPlayer are supported video players.

RAReXtract


RAReXtract is a Front-End for the UnRAR command line utility for Mac OS X 10.5 (Leopard). Its purpose is the rapid and convenient extraction of RAR archives with a double click.

RAR Expander


Rar Expander is a MacOSX program which extracts the files contained in single or multi-volume RAR archives. It uses the official unRAR library internally so it is fully compatible with archives produced by WinRAR.

rarcrack


This program uses a brute force algorithm to guess your encrypted compressed file\'s password. If you forget your encrypted file password, this program is the solution. This program can crack zip,7z and rar file passwords.

RArcInfo


RArcInfo is a package for R (http://www.r-project.org) to import data from binary Arc/Info V7.X coverages and E00 files . This will allow R users to used it as a primary GIS tool.

rar brute force shell script - rarbrute


This is rarbrute, a shell script to brute force encrypted rar files under unix and linux. A long wordlist and a paper about security in internet cafes is included.

Raquel Database System


The system will : 1. use RAQUEL (= Relational Algebra Query, Update and Executive Language) for programming, implementing Third Manifesto principles. 2. have a 'Lego-like' architecture of building blocks and plug-ins, for wider applicability.

RAPv4


RAPv4 is an engine for building web application with only a business description (in XML format). NEW 04/2006 : Stable 2006 release. Add new functions like mail, sms, web services, graph, map engine (GIS), Excel output, QBE... and also a beta release of

Rafkill


2d Scroller. Clone of Raptor: Call of the Shadows and Tyrian. Fun game written in c++ using allegro.

rapple


Lightweight XML based transformation tool written in C that builds upon expat, tidylib and XSLT to tranform authored web content (incl. Word processor generated HTML) into styled web content suitable for publication.

RapidSMS


RapidSMS is an open-source internet and communications platform

RapidSmith


RapidSmith is a research-based FPGA CAD tool framework written in Java for modern Xilinx FPGAs. Based on XDL, its objective is to serve as a rapid prototyping platform for research ideas and algorithms relating to low level FPGA CAD tools.

Rapidshare Mass Downloader


What this program does is bringing out human interaction while downloading files from rapidshare(without premium account). It downloads all the rapidshare links sequentially to the specified location.

rapido visual profiler


rapido is a visual profiler for linux-x86. It traces function call using the ptrace interface and displays the information collected in a nice visual flow chart. rapido does not require the re-compilation of the application.