This draft of the documentation remains immature. While we have made attempt to be thorough and accurate, you may encounter errors. If you discover any deficiencies, please let us know at info@kingdesk.com
This page is a subset of the documentation of the functionality provided by the PHP Parser project.class parseText
Description
parseText is a class designed for efficient parsing and reconstruction of plain text. The parseText class has the following requirements:
- the text provided for parsing must be free of HTML markup (except for special HTML characters like
>) - text must be encoded UTF-8
- the host server must run PHP 5 or later
parseText will tokenize the provided plain text into the following content types:
- space
- punctuation
- word
- other
Examples
A basic example:
<?php
$text = "sample text and an email@example.com";
include('path/to/php-parser.php');
$parsedText = new parseText();
$parsedText->load($text);
$words = $parsedText->get_words();
foreach($words as &$word) {
$word["value"] = strtoupper($word["value"]);
}
$parsedText->update($words);
$text = $parsedText->unload();
echo $text; // SAMPLE TEXT AND AN email@example.com
?>
parseText can also be combined with class parseHTML for granular parsing of HTML documents:
<?php
$html = "<p>Go to http://example.com.</p>";
include('path/to/php-parser.php');
$parsedHTML = new parseHTML();
$parsedHTML->load($html);
$parsedHTML->unlock_text();
$unlockedTexts = $parsedHTML->get_unlocked_text();
foreach($unlockedTexts as &$unlockedText) {
$parsedText = new parseText();
$parsedText->load($unlockedText);
$words = $parsedText->get_words();
foreach($words as &$word) {
$word["value"] = strtoupper($word["value"]);
}
$parsedText->update($words);
$unlockedText = $parsedText->unload();
}
$parsedHTML->update($unlockedTexts);
$html = $parsedHTML->unload();
echo $html; // <p>GO TO http://example.com.</p>;
?>

