class parseHTML Documentation

This draft of the doc­u­men­ta­tion remains imma­ture. While we have made attempt to be thor­ough and accu­rate, you may encounter errors. If you dis­cover any defi­cien­cies, please let us know at info@​kingdesk.​com

This page is a sub­set of the doc­u­men­ta­tion of the func­tion­al­ity pro­vided by the PHP Parser project.

class parse­HTML

Descrip­tion

parse­HTML is a class designed for effi­cient pars­ing and recon­struc­tion of valid xHTML markup. In par­tic­u­lar, the fol­low­ing must be true of the pro­vided HTML:

  • every tag must be closed,
  • every attribute must have a value enclosed in quotes, and
  • tag names and attrib­utes must be lowercase.

Addi­tion­ally, the parse­HTML class has the fol­low­ing requirements:

  • text must be encoded UTF-​​8
  • the host server must run PHP 5 or later

parse­HTML will tok­enize the pro­vided HTML into the fol­low­ing con­tent types:

  • the XML declaration
  • the Doc­u­ment Type Definition
  • HTML tags
  • plain text
  • CDATA
  • HTML com­ments

Exam­ples

A basic example:


<?php
$html = "<p>some text</p>";

include('path/to/php-parser.php');
$parsedHTML = new parseHTML();
$parsedHTML->load($html);
$parsedHTML->unlock_text();
$unlockedTexts = $parsedHTML->get_unlocked_text();
foreach($unlockedTexts as &$unlockedText) {
// do something here like... SHOUT!
$unlockedText["value"] = strtoupper($unlockedText["value"]);
}
$parsedHTML->update($unlockedTexts);
$html = $parsedHTML->unload();

echo $html; // <p>some text</p>
?>

parse­HTML can also be com­bined with class parse­Text for even more gran­u­lar access:


<?php
<?php
$html = "<p>Go to http://example.com.</p>";

include('path/to/php-parser.php');
$parsedHTML = new parseHTML();
$parsedHTML->load($html);
$parsedHTML->unlock_text();
$unlockedTexts = $parsedHTML->get_unlocked_text();
foreach($unlockedTexts as &$unlockedText) {
$parsedText = new parseText();
$parsedText->load($unlockedText);
$words = $parsedText->get_words();
foreach($words as &$word) {
	$word["value"] = strtoupper($word["value"]);
}
$parsedText->update($words);
$unlockedText = $parsedText->unload();
}
$parsedHTML->update($unlockedTexts);
$html = $parsedHTML->unload();

echo $html; // <p>GO TO http://example.com.</p>;
?>
?>

return to top

parse­HTML Methods