Documentation TYPO3 par Ameos |
Public Member Functions | |
tx_indexedsearch_lexer () | |
split2Words ($wordString) | |
addWords (&$words, &$wordString, $start, $len) | |
get_word (&$str, $pos=0) | |
utf8_is_letter (&$str, &$len, $pos=0) | |
charType ($cp) | |
utf8_ord (&$str, &$len, $pos=0, $hex=false) | |
Public Attributes | |
$debug = FALSE | |
$debugString = '' | |
$csObj | |
$lexerConf |
Definition at line 73 of file class.lexer.php.
tx_indexedsearch_lexer::tx_indexedsearch_lexer | ( | ) |
Constructor: Initializes the charset class, t3lib_cs
Definition at line 105 of file class.lexer.php.
References t3lib_div::makeInstance().
tx_indexedsearch_lexer::split2Words | ( | $ | wordString | ) |
Splitting string into words. Used for indexing, can also be used to find words in query.
string | String with UTF-8 content to process. |
Definition at line 116 of file class.lexer.php.
References addWords(), and get_word().
tx_indexedsearch_lexer::addWords | ( | &$ | words, | |
&$ | wordString, | |||
$ | start, | |||
$ | len | |||
) |
Add word to word-array This function should be used to make sure CJK sequences are split up in the right way
array | Array of accumulated words | |
string | Complete Input string from where to extract word | |
integer | Start position of word in input string | |
integer | The Length of the word string from start position |
Definition at line 178 of file class.lexer.php.
References charType(), and utf8_ord().
Referenced by split2Words().
tx_indexedsearch_lexer::get_word | ( | &$ | str, | |
$ | pos = 0 | |||
) |
Get the first word in a given utf-8 string (initial non-letters will be skipped)
string | Input string (reference) | |
integer | Starting position in input string |
Definition at line 239 of file class.lexer.php.
References utf8_is_letter().
Referenced by split2Words().
tx_indexedsearch_lexer::utf8_is_letter | ( | &$ | str, | |
&$ | len, | |||
$ | pos = 0 | |||
) |
See if a character is a letter (or a string of letters or non-letters).
string | Input string (reference) | |
integer | Byte-length of character sequence (reference, return value) | |
integer | Starting position in input string |
Definition at line 264 of file class.lexer.php.
References charType(), t3lib_div::inList(), and utf8_ord().
Referenced by get_word().
tx_indexedsearch_lexer::charType | ( | $ | cp | ) |
Determine the type of character
integer | Unicode number to evaluate |
Definition at line 329 of file class.lexer.php.
Referenced by addWords(), and utf8_is_letter().
tx_indexedsearch_lexer::utf8_ord | ( | &$ | str, | |
&$ | len, | |||
$ | pos = 0 , |
|||
$ | hex = false | |||
) |
Converts a UTF-8 multibyte character to a UNICODE codepoint
string | UTF-8 multibyte character string (reference) | |
integer | The length of the character (reference, return value) | |
integer | Starting position in input string | |
boolean | If set, then a hex. number is returned |
Definition at line 383 of file class.lexer.php.
Referenced by addWords(), and utf8_is_letter().
tx_indexedsearch_lexer::$lexerConf |
Initial value:
array( 'printjoins' => array( // This is the Unicode numbers of chars that are allowed INSIDE a sequence of letter chars (alphanum + CJK) 0x2e, // "." 0x2d, // "-" 0x5f, // "_" 0x3a, // ":" 0x2f, // "/" 0x27, // "'" // 0x615, // ARABIC SMALL HIGH TAH ), 'casesensitive' => FALSE, // Set, if case sensitive indexing is wanted. 'removeChars' => array( // List of unicode numbers of chars that will be removed before words are returned (eg. "-") 0x2d // "-" ) )
Definition at line 83 of file class.lexer.php.