This draft of the documentation remains immature. While we have made attempt to be thorough and accurate, you may encounter errors. If you discover any deficiencies, please let us know at info@kingdesk.com
This page is a subset of the documentation of the functionality provided by the PHP Parser project.class parseHTML
Description
parseHTML is a class designed for efficient parsing and reconstruction of valid xHTML markup. In particular, the following must be true of the provided HTML:
- every tag must be closed,
- every attribute must have a value enclosed in quotes, and
- tag names and attributes must be lowercase.
Additionally, the parseHTML class has the following requirements:
- text must be encoded UTF-8
- the host server must run PHP 5 or later
parseHTML will tokenize the provided HTML into the following content types:
- the XML declaration
- the Document Type Definition
- HTML tags
- plain text
- CDATA
- HTML comments
Examples
A basic example:
<?php
$html = "<p>some text</p>";
include('path/to/php-parser.php');
$parsedHTML = new parseHTML();
$parsedHTML->load($html);
$parsedHTML->unlock_text();
$unlockedTexts = $parsedHTML->get_unlocked_text();
foreach($unlockedTexts as &$unlockedText) {
// do something here like... SHOUT!
$unlockedText["value"] = strtoupper($unlockedText["value"]);
}
$parsedHTML->update($unlockedTexts);
$html = $parsedHTML->unload();
echo $html; // <p>some text</p>
?>
parseHTML can also be combined with class parseText for even more granular access:
<?php
<?php
$html = "<p>Go to http://example.com.</p>";
include('path/to/php-parser.php');
$parsedHTML = new parseHTML();
$parsedHTML->load($html);
$parsedHTML->unlock_text();
$unlockedTexts = $parsedHTML->get_unlocked_text();
foreach($unlockedTexts as &$unlockedText) {
$parsedText = new parseText();
$parsedText->load($unlockedText);
$words = $parsedText->get_words();
foreach($words as &$word) {
$word["value"] = strtoupper($word["value"]);
}
$parsedText->update($words);
$unlockedText = $parsedText->unload();
}
$parsedHTML->update($unlockedTexts);
$html = $parsedHTML->unload();
echo $html; // <p>GO TO http://example.com.</p>;
?>
?>
parseHTML Methods
- load()
- reload()
- unload()
- update()
- clear()
- lock()
- unlock()
- lock_comments()
- unlock_comments()
- lock_dtd()
- unlock_dtd()
- lock_cdata()
- unlock_cdata()
- lock_xml()
- unlock_xml()
- lock_tags()
- unlock_tags()
- lock_text()
- unlock_text()
- lock_children()
- unlock_children()
- get_all()
- get_locked()
- get_unlocked()
- get_comments()
- get_locked_comments()
- get_unlocked_comments()
- get_dtd()
- get_locked_dtd()
- get_unlocked_dtd()
- get_cdata()
- get_locked_cdata()
- get_unlocked_cdata()
- get_xml()
- get_locked_xml()
- get_unlocked_xml()
- get_tags()
- get_locked_tags()
- get_unlocked_tags()
- get_text()
- get_locked_text()
- get_unlocked_text()
- get_tags_by_name()
- get_tag_by_id()
- get_tags_by_class()
- get_tags_by_attribute()
- get_children()
- in_tag()
- in_id()
- in_class()
- in_attribute()

