class parseText Documentation

This draft of the doc­u­men­ta­tion remains imma­ture. While we have made attempt to be thor­ough and accu­rate, you may encounter errors. If you dis­cover any defi­cien­cies, please let us know at info@​kingdesk.​com

This page is a sub­set of the doc­u­men­ta­tion of the func­tion­al­ity pro­vided by the PHP Parser project.

class parse­Text

Descrip­tion

parse­Text is a class designed for effi­cient pars­ing and recon­struc­tion of plain text. The parse­Text class has the fol­low­ing requirements:

  • the text pro­vided for pars­ing must be free of HTML markup (except for spe­cial HTML char­ac­ters like >)
  • text must be encoded UTF-​​8
  • the host server must run PHP 5 or later

parse­Text will tok­enize the pro­vided plain text into the fol­low­ing con­tent types:

  • space
  • punc­tu­a­tion
  • word
  • other

Exam­ples

A basic example:


<?php
$text = "sample text and an email@example.com";

include('path/to/php-parser.php');
$parsedText = new parseText();
$parsedText->load($text);
$words = $parsedText->get_words();
foreach($words as &$word) {
	$word["value"] = strtoupper($word["value"]);
}
$parsedText->update($words);
$text = $parsedText->unload();

echo $text; // SAMPLE TEXT AND AN email@example.com
?>

parse­Text can also be com­bined with class parse­HTML for gran­u­lar pars­ing of HTML documents:


<?php
$html = "<p>Go to http://example.com.</p>";

include('path/to/php-parser.php');
$parsedHTML = new parseHTML();
$parsedHTML->load($html);
$parsedHTML->unlock_text();
$unlockedTexts = $parsedHTML->get_unlocked_text();

foreach($unlockedTexts as &$unlockedText) {
	$parsedText = new parseText();
	$parsedText->load($unlockedText);
	$words = $parsedText->get_words();
	foreach($words as &$word) {
		$word["value"] = strtoupper($word["value"]);
	}
	$parsedText->update($words);
	$unlockedText = $parsedText->unload();
}

$parsedHTML->update($unlockedTexts);
$html = $parsedHTML->unload();

echo $html; // <p>GO TO http://example.com.</p>;
?>

return to top

parse­Text Methods