While developing of our upcoming PHP Typography project, it became clear we needed a PHP based solution to parse and reassemble HTML and its contained text. We were unable to find anything that suited our needs, so we built one. When done, we realized that what we had built was powerful, flexible, efficient, and valuable… even outside the initial PHP Typography project. So, we are releasing PHP Parser as a stand-alone project for your use. We hope you find it as helpful as we have.
PHP Parser consists of two classes: parseHTML and parsedText. As you may have guessed, parseHTML parses HTML and parseText parses text. Using PHP Parser is as easy as this:
<?php
$html = "raw html..."
include( 'path/to/php-parser.php' );
$parsedHTML = new parseHTML( );
$parsedHTML->load( $html );
$parsedHTML->unlock_text( );
$tagsToIgnore = $parsedHTML->get_tags_by_name(
array( "code", "pre", ... )
);
$parsedHTML->lock_children( $tagsToIgnore );
$unlockedTexts = $parsedHTML->get_unlocked_text( );
foreach( $unlockedTexts as &$unlockedText ) {
$parsedText = new parseText( );
$parsedText->load( $unlockedText );
$words = $parsedText->get_words( );
foreach ( $words as &$word ) {
// do stuff to $word["value"]
}
$parsedText->update( $words );
$unlockedText[ "value" ] = $parsedText->unload( );
}
$parsedHTML->update( $unlockedTexts );
$html = $parsedHTML->unload( );
?>
If you use PHP Parser in a project, please let us know. We’d love to link to it.
Your feedback is much appreciated. How can we make this plugin better? Email us at info@kingdesk.com

