Documentation TYPO3 par Ameos

tx_indexedsearch_lexer Class Reference

List of all members.

Public Member Functions

 tx_indexedsearch_lexer ()
 split2Words ($wordString)
 addWords (&$words, &$wordString, $start, $len)
 get_word (&$str, $pos=0)
 utf8_is_letter (&$str, &$len, $pos=0)
 charType ($cp)
 utf8_ord (&$str, &$len, $pos=0, $hex=false)

Public Attributes

 $debug = FALSE
 $debugString = ''

Detailed Description

Definition at line 73 of file class.lexer.php.

Member Function Documentation

tx_indexedsearch_lexer::tx_indexedsearch_lexer (  ) 

Constructor: Initializes the charset class, t3lib_cs


Definition at line 105 of file class.lexer.php.

References t3lib_div::makeInstance().

tx_indexedsearch_lexer::split2Words ( wordString  ) 

Splitting string into words. Used for indexing, can also be used to find words in query.

string String with UTF-8 content to process.
array Array of words in utf-8

Definition at line 116 of file class.lexer.php.

References addWords(), and get_word().

tx_indexedsearch_lexer::addWords ( &$  words,
&$  wordString,

Add word to word-array This function should be used to make sure CJK sequences are split up in the right way

array Array of accumulated words
string Complete Input string from where to extract word
integer Start position of word in input string
integer The Length of the word string from start position

Definition at line 178 of file class.lexer.php.

References charType(), and utf8_ord().

Referenced by split2Words().

tx_indexedsearch_lexer::get_word ( &$  str,
pos = 0 

Get the first word in a given utf-8 string (initial non-letters will be skipped)

string Input string (reference)
integer Starting position in input string
array 0: start, 1: len or false if no word has been found

Definition at line 239 of file class.lexer.php.

References utf8_is_letter().

Referenced by split2Words().

tx_indexedsearch_lexer::utf8_is_letter ( &$  str,
&$  len,
pos = 0 

See if a character is a letter (or a string of letters or non-letters).

string Input string (reference)
integer Byte-length of character sequence (reference, return value)
integer Starting position in input string
boolean letter (or word) found

Definition at line 264 of file class.lexer.php.

References charType(), t3lib_div::inList(), and utf8_ord().

Referenced by get_word().

tx_indexedsearch_lexer::charType ( cp  ) 

Determine the type of character

integer Unicode number to evaluate
array Type of char; index-0: the main type: num, alpha or CJK (Chinese / Japanese / Korean)

Definition at line 329 of file class.lexer.php.

Referenced by addWords(), and utf8_is_letter().

tx_indexedsearch_lexer::utf8_ord ( &$  str,
&$  len,
pos = 0,
hex = false 

Converts a UTF-8 multibyte character to a UNICODE codepoint

string UTF-8 multibyte character string (reference)
integer The length of the character (reference, return value)
integer Starting position in input string
boolean If set, then a hex. number is returned
integer UNICODE codepoint

Definition at line 383 of file class.lexer.php.

Referenced by addWords(), and utf8_is_letter().

Member Data Documentation


Initial value:

                'printjoins' => array(  // This is the Unicode numbers of chars that are allowed INSIDE a sequence of letter chars (alphanum + CJK)
                        0x2e,   // "."
                        0x2d,   // "-"
                        0x5f,   // "_"
                        0x3a,   // ":"
                        0x2f,   // "/"
                        0x27,   // "'"
                        // 0x615,       // ARABIC SMALL HIGH TAH
                'casesensitive' => FALSE,       // Set, if case sensitive indexing is wanted.
                'removeChars' => array(         // List of unicode numbers of chars that will be removed before words are returned (eg. "-")
                        0x2d    // "-"

Definition at line 83 of file class.lexer.php.

The documentation for this class was generated from the following file:

Généré par Les spécialistes TYPO3 avec  doxygen 1.4.6