Documentation TYPO3 par Ameos |
Public Member Functions | |
| tx_indexedsearch_lexer () | |
| split2Words ($wordString) | |
| addWords (&$words, &$wordString, $start, $len) | |
| get_word (&$str, $pos=0) | |
| utf8_is_letter (&$str, &$len, $pos=0) | |
| charType ($cp) | |
| utf8_ord (&$str, &$len, $pos=0, $hex=false) | |
Public Attributes | |
| $debug = FALSE | |
| $debugString = '' | |
| $csObj | |
| $lexerConf | |
Definition at line 73 of file class.lexer.php.
| tx_indexedsearch_lexer::tx_indexedsearch_lexer | ( | ) |
Constructor: Initializes the charset class, t3lib_cs
Definition at line 105 of file class.lexer.php.
References t3lib_div::makeInstance().
| tx_indexedsearch_lexer::split2Words | ( | $ | wordString | ) |
Splitting string into words. Used for indexing, can also be used to find words in query.
| string | String with UTF-8 content to process. |
Definition at line 116 of file class.lexer.php.
References addWords(), and get_word().
| tx_indexedsearch_lexer::addWords | ( | &$ | words, | |
| &$ | wordString, | |||
| $ | start, | |||
| $ | len | |||
| ) |
Add word to word-array This function should be used to make sure CJK sequences are split up in the right way
| array | Array of accumulated words | |
| string | Complete Input string from where to extract word | |
| integer | Start position of word in input string | |
| integer | The Length of the word string from start position |
Definition at line 178 of file class.lexer.php.
References charType(), and utf8_ord().
Referenced by split2Words().
| tx_indexedsearch_lexer::get_word | ( | &$ | str, | |
| $ | pos = 0 | |||
| ) |
Get the first word in a given utf-8 string (initial non-letters will be skipped)
| string | Input string (reference) | |
| integer | Starting position in input string |
Definition at line 239 of file class.lexer.php.
References utf8_is_letter().
Referenced by split2Words().
| tx_indexedsearch_lexer::utf8_is_letter | ( | &$ | str, | |
| &$ | len, | |||
| $ | pos = 0 | |||
| ) |
See if a character is a letter (or a string of letters or non-letters).
| string | Input string (reference) | |
| integer | Byte-length of character sequence (reference, return value) | |
| integer | Starting position in input string |
Definition at line 264 of file class.lexer.php.
References charType(), t3lib_div::inList(), and utf8_ord().
Referenced by get_word().
| tx_indexedsearch_lexer::charType | ( | $ | cp | ) |
Determine the type of character
| integer | Unicode number to evaluate |
Definition at line 329 of file class.lexer.php.
Referenced by addWords(), and utf8_is_letter().
| tx_indexedsearch_lexer::utf8_ord | ( | &$ | str, | |
| &$ | len, | |||
| $ | pos = 0, |
|||
| $ | hex = false | |||
| ) |
Converts a UTF-8 multibyte character to a UNICODE codepoint
| string | UTF-8 multibyte character string (reference) | |
| integer | The length of the character (reference, return value) | |
| integer | Starting position in input string | |
| boolean | If set, then a hex. number is returned |
Definition at line 383 of file class.lexer.php.
Referenced by addWords(), and utf8_is_letter().
| tx_indexedsearch_lexer::$lexerConf |
Initial value:
array(
'printjoins' => array( // This is the Unicode numbers of chars that are allowed INSIDE a sequence of letter chars (alphanum + CJK)
0x2e, // "."
0x2d, // "-"
0x5f, // "_"
0x3a, // ":"
0x2f, // "/"
0x27, // "'"
// 0x615, // ARABIC SMALL HIGH TAH
),
'casesensitive' => FALSE, // Set, if case sensitive indexing is wanted.
'removeChars' => array( // List of unicode numbers of chars that will be removed before words are returned (eg. "-")
0x2d // "-"
)
)
Definition at line 83 of file class.lexer.php.
1.4.6