Documentation TYPO3 par Ameos

t3lib_cs Class Reference

List of all members.

Public Member Functions

 parse_charset ($charset)
 get_locale_charset ($locale)
 conv ($str, $fromCS, $toCS, $useEntityForNoChar=0)
 convArray (&$array, $fromCS, $toCS, $useEntityForNoChar=0)
 utf8_encode ($str, $charset)
 utf8_decode ($str, $charset, $useEntityForNoChar=0)
 utf8_to_entities ($str)
 entities_to_utf8 ($str, $alsoStdHtmlEnt=0)
 utf8_to_numberarray ($str, $convEntities=0, $retChar=0)
 UnumberToChar ($cbyte)
 utf8CharToUnumber ($str, $hex=0)
 initCharset ($charset)
 initUnicodeData ($mode=null)
 initCaseFolding ($charset)
 initToASCII ($charset)
 substr ($charset, $string, $start, $len=null)
 strlen ($charset, $string)
 crop ($charset, $string, $len, $crop='')
 strtrunc ($charset, $string, $len)
 conv_case ($charset, $string, $case)
 specCharsToASCII ($charset, $string)
 sb_char_mapping ($str, $charset, $mode, $opt='')
 utf8_substr ($str, $start, $len=null)
 utf8_strlen ($str)
 utf8_strtrunc ($str, $len)
 utf8_strpos ($haystack, $needle, $offset=0)
 utf8_strrpos ($haystack, $needle)
 utf8_char2byte_pos ($str, $pos)
 utf8_byte2char_pos ($str, $pos)
 utf8_char_mapping ($str, $mode, $opt='')
 euc_strtrunc ($str, $len, $charset)
 euc_substr ($str, $start, $charset, $len=null)
 euc_strlen ($str, $charset)
 euc_char2byte_pos ($str, $pos, $charset)
 euc_char_mapping ($str, $charset, $mode, $opt='')

Public Attributes

 $noCharByteVal = 63
 $parsedCharsets = array()
 $caseFolding = array()
 $toASCII = array()
 $twoByteSets
 $fourByteSets
 $eucBasedSets
 $synonyms
 $lang_to_script
 $script_to_charset_unix
 $script_to_charset_windows
 $locale_to_charset
 $charSetArray
 $isoArray

Detailed Description

Definition at line 136 of file class.t3lib_cs.php.


Member Function Documentation

t3lib_cs::parse_charset ( charset  ) 

Normalize - changes input character set to lowercase letters.

Parameters:
string Input charset
Returns:
string Normalized charset
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 527 of file class.t3lib_cs.php.

Referenced by get_locale_charset().

t3lib_cs::get_locale_charset ( locale  ) 

Get the charset of a locale.

ln language ln_CN language / country ln_CN.cs language / country / charset ln_CN.cs language / country / charset / modifier

Parameters:
string Locale string
Returns:
string Charset resolved for locale string
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 546 of file class.t3lib_cs.php.

References parse_charset().

t3lib_cs::conv ( str,
fromCS,
toCS,
useEntityForNoChar = 0 
)

Convert from one charset to another charset.

Parameters:
string Input string
string From charset (the current charset of the string)
string To charset (the output charset wanted)
boolean If set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns:
string Converted string
See also:
convArray()

Definition at line 599 of file class.t3lib_cs.php.

References utf8_decode(), and utf8_encode().

Referenced by convArray().

t3lib_cs::convArray ( &$  array,
fromCS,
toCS,
useEntityForNoChar = 0 
)

Convert all elements in ARRAY from one charset to another charset. NOTICE: Array is passed by reference!

Parameters:
string Input array, possibly multidimensional
string From charset (the current charset of the string)
string To charset (the output charset wanted)
boolean If set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns:
void
See also:
conv()

Definition at line 639 of file class.t3lib_cs.php.

References conv().

t3lib_cs::utf8_encode ( str,
charset 
)

Converts $str from $charset to UTF-8

Parameters:
string String in local charset to convert to UTF-8
string Charset, lowercase. Must be found in csconvtbl/ folder.
Returns:
string Output string, converted to UTF-8

Definition at line 656 of file class.t3lib_cs.php.

References initCharset(), strlen(), and substr().

Referenced by conv(), and entities_to_utf8().

t3lib_cs::utf8_decode ( str,
charset,
useEntityForNoChar = 0 
)

Converts $str from UTF-8 to $charset

Parameters:
string String in UTF-8 to convert to local charset
string Charset, lowercase. Must be found in csconvtbl/ folder.
boolean If set, then characters that are not available in the destination character set will be encoded as numeric entities
Returns:
string Output string, converted to local charset

Definition at line 702 of file class.t3lib_cs.php.

References initCharset(), strlen(), and substr().

Referenced by conv(), initCaseFolding(), and initToASCII().

t3lib_cs::utf8_to_entities ( str  ) 

Converts all chars > 127 to numeric entities.

Parameters:
string Input string
Returns:
string Output string

Definition at line 745 of file class.t3lib_cs.php.

References strlen(), and substr().

t3lib_cs::entities_to_utf8 ( str,
alsoStdHtmlEnt = 0 
)

Converts numeric entities (UNICODE, eg. decimal (&#1234;) or hexadecimal (&x1b;)) to UTF-8 multibyte chars

Parameters:
string Input string, UTF-8
boolean If set, then all string-HTML entities (like & or will be converted as well)
Returns:
string Output string

Definition at line 778 of file class.t3lib_cs.php.

References substr(), UnumberToChar(), and utf8_encode().

Referenced by utf8_to_numberarray().

t3lib_cs::utf8_to_numberarray ( str,
convEntities = 0,
retChar = 0 
)

Converts all chars in the input UTF-8 string into integer numbers returned in an array

Parameters:
string Input string, UTF-8
boolean If set, then all HTML entities (like & or or &#123; or &x3f5d;) will be detected as characters.
boolean If set, then instead of integer numbers the real UTF-8 char is returned.
Returns:
array Output array with the char numbers

Definition at line 812 of file class.t3lib_cs.php.

References entities_to_utf8(), strlen(), substr(), and utf8CharToUnumber().

t3lib_cs::UnumberToChar ( cbyte  ) 

Converts a UNICODE number to a UTF-8 multibyte character Algorithm based on script found at From: http://czyborra.com/utf/ Unit-tested by Kasper

The binary representation of the character's integer value is thus simply spread across the bytes and the number of high bits set in the lead byte announces the number of bytes in the multibyte sequence:

bytes | bits | representation 1 | 7 | 0vvvvvvv 2 | 11 | 110vvvvv 10vvvvvv 3 | 16 | 1110vvvv 10vvvvvv 10vvvvvv 4 | 21 | 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv 5 | 26 | 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 6 | 31 | 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv

Parameters:
integer UNICODE integer
Returns:
string UTF-8 multibyte character string
See also:
utf8CharToUnumber()

Definition at line 862 of file class.t3lib_cs.php.

Referenced by entities_to_utf8(), initCharset(), and initUnicodeData().

t3lib_cs::utf8CharToUnumber ( str,
hex = 0 
)

Converts a UTF-8 Multibyte character to a UNICODE number Unit-tested by Kasper

Parameters:
string UTF-8 multibyte character string
boolean If set, then a hex. number is returned.
Returns:
integer UNICODE integer
See also:
UnumberToChar()

Definition at line 907 of file class.t3lib_cs.php.

References substr().

Referenced by utf8_to_numberarray().

t3lib_cs::initCharset ( charset  ) 

This will initialize a charset for use if it's defined in the PATH_t3lib.'csconvtbl/' folder This function is automatically called by the conversion functions

PLEASE SEE: http://www.unicode.org/Public/MAPPINGS/

Parameters:
string The charset to be initialized. Use lowercase charset always (the charset must match exactly with a filename in csconvtbl/ folder ([charset].tbl)
Returns:
integer Returns '1' if already loaded. Returns FALSE if charset conversion table was not found. Returns '2' if the charset conversion table was found and parsed. private

Definition at line 950 of file class.t3lib_cs.php.

References t3lib_div::getFileAbsFileName(), substr(), t3lib_div::trimExplode(), UnumberToChar(), t3lib_div::validPathStr(), and t3lib_div::writeFileToTypo3tempDir().

Referenced by initCaseFolding(), initToASCII(), utf8_decode(), and utf8_encode().

t3lib_cs::initUnicodeData ( mode = null  ) 

This function initializes all UTF-8 character data tables.

PLEASE SEE: http://www.unicode.org/Public/UNIDATA/

Parameters:
string Mode ("case", "ascii", ...)
Returns:
integer Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached). private

Definition at line 1012 of file class.t3lib_cs.php.

References t3lib_div::getFileAbsFileName(), t3lib_div::trimExplode(), UnumberToChar(), t3lib_div::validPathStr(), and t3lib_div::writeFileToTypo3tempDir().

Referenced by initCaseFolding(), initToASCII(), and utf8_char_mapping().

t3lib_cs::initCaseFolding ( charset  ) 

This function initializes the folding table for a charset other than UTF-8. This function is automatically called by the case folding functions.

Parameters:
string Charset for which to initialize case folding.
Returns:
integer Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached). private

Definition at line 1237 of file class.t3lib_cs.php.

References t3lib_div::getFileAbsFileName(), initCharset(), initUnicodeData(), utf8_decode(), and t3lib_div::writeFileToTypo3tempDir().

Referenced by euc_char_mapping(), and sb_char_mapping().

t3lib_cs::initToASCII ( charset  ) 

This function initializes the to-ASCII conversion table for a charset other than UTF-8. This function is automatically called by the ASCII transliteration functions.

Parameters:
string Charset for which to initialize conversion.
Returns:
integer Returns FALSE on error, a TRUE value on success: 1 table already loaded, 2, cached version, 3 table parsed (and cached). private

Definition at line 1299 of file class.t3lib_cs.php.

References t3lib_div::getFileAbsFileName(), initCharset(), initUnicodeData(), utf8_decode(), and t3lib_div::writeFileToTypo3tempDir().

Referenced by euc_char_mapping(), and sb_char_mapping().

t3lib_cs::substr ( charset,
string,
start,
len = null 
)

Returns a part of a string. Unit-tested by Kasper (single byte charsets only)

Parameters:
string The character set
string Character string
integer Start position (character position)
integer Length (in characters)
Returns:
string The substring
See also:
substr(), mb_substr()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1370 of file class.t3lib_cs.php.

References euc_substr(), and utf8_substr().

Referenced by crop(), entities_to_utf8(), euc_char_mapping(), euc_strtrunc(), euc_substr(), initCharset(), strtrunc(), utf8_char_mapping(), utf8_decode(), utf8_encode(), utf8_strtrunc(), utf8_substr(), utf8_to_entities(), utf8_to_numberarray(), and utf8CharToUnumber().

t3lib_cs::strlen ( charset,
string 
)

Counts the number of characters. Unit-tested by Kasper (single byte charsets only)

Parameters:
string The character set
string Character string
Returns:
integer The number of characters
See also:
strlen()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1423 of file class.t3lib_cs.php.

References euc_strlen(), and utf8_strlen().

Referenced by crop(), euc_char2byte_pos(), euc_char_mapping(), euc_strlen(), euc_strtrunc(), sb_char_mapping(), utf8_byte2char_pos(), utf8_char2byte_pos(), utf8_char_mapping(), utf8_decode(), utf8_encode(), utf8_strlen(), utf8_to_entities(), and utf8_to_numberarray().

t3lib_cs::crop ( charset,
string,
len,
crop = '' 
)

Truncates a string and pre-/appends a string. Unit tested by Kasper

Parameters:
string The character set
string Character string
integer Length (in characters)
string Crop signifier
Returns:
string The shortened string
See also:
substr(), mb_strimwidth()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1453 of file class.t3lib_cs.php.

References euc_char2byte_pos(), strlen(), substr(), and utf8_char2byte_pos().

t3lib_cs::strtrunc ( charset,
string,
len 
)

Cuts a string short at a given byte length.

Parameters:
string The character set
string Character string
integer The byte length
Returns:
string The shortened string
See also:
mb_strcut()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1506 of file class.t3lib_cs.php.

References euc_strtrunc(), substr(), and utf8_strtrunc().

t3lib_cs::conv_case ( charset,
string,
case 
)

Translates all characters of a string into their respective case values. Unlike strtolower() and strtoupper() this method is locale independent. Note that the string length may change! eg. lower case German �(sharp S) becomes upper case "SS" Unit-tested by Kasper Real case folding is language dependent, this method ignores this fact.

Parameters:
string Character set of string
string Input string to convert case for
string Case keyword: "toLower" means lowercase conversion, anything else is uppercase (use "toUpper" )
Returns:
string The converted string
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>
See also:
strtolower(), strtoupper()

Definition at line 1540 of file class.t3lib_cs.php.

References euc_char_mapping(), sb_char_mapping(), and utf8_char_mapping().

t3lib_cs::specCharsToASCII ( charset,
string 
)

Converts special chars (like ���, umlauts etc) to ascii equivalents (usually double-bytes, like �=> ae etc.)

Parameters:
string Character set of string
string Input string to convert
Returns:
string The converted string

Definition at line 1566 of file class.t3lib_cs.php.

References euc_char_mapping(), sb_char_mapping(), and utf8_char_mapping().

t3lib_cs::sb_char_mapping ( str,
charset,
mode,
opt = '' 
)

Maps all characters of a string in a single byte charset.

Parameters:
string the string
string the charset
string mode: 'case' (case folding) or 'ascii' (ASCII transliteration)
string 'case': conversion 'toLower' or 'toUpper'
Returns:
string the converted string
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1606 of file class.t3lib_cs.php.

References initCaseFolding(), initToASCII(), and strlen().

Referenced by conv_case(), and specCharsToASCII().

t3lib_cs::utf8_substr ( str,
start,
len = null 
)

Returns a part of a UTF-8 string. Unit-tested by Kasper and works 100% like substr() / mb_substr() for full range of $start/$len

Parameters:
string UTF-8 string
integer Start position (character position)
integer Length (in characters)
Returns:
string The substring
See also:
substr()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1661 of file class.t3lib_cs.php.

References substr(), and utf8_char2byte_pos().

Referenced by substr().

t3lib_cs::utf8_strlen ( str  ) 

Counts the number of characters of a string in UTF-8. Unit-tested by Kasper and works 100% like strlen() / mb_strlen()

Parameters:
string UTF-8 multibyte character string
Returns:
integer The number of characters
See also:
strlen()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1694 of file class.t3lib_cs.php.

References strlen().

Referenced by strlen().

t3lib_cs::utf8_strtrunc ( str,
len 
)

Truncates a string in UTF-8 short at a given byte length.

Parameters:
string UTF-8 multibyte character string
integer the byte length
Returns:
string the shortened string
See also:
mb_strcut()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1715 of file class.t3lib_cs.php.

References substr().

Referenced by strtrunc().

t3lib_cs::utf8_strpos ( haystack,
needle,
offset = 0 
)

Find position of first occurrence of a string, both arguments are in UTF-8.

Parameters:
string UTF-8 string to search in
string UTF-8 string to search for
integer Positition to start the search
Returns:
integer The character position
See also:
strpos()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1737 of file class.t3lib_cs.php.

References utf8_byte2char_pos(), and utf8_char2byte_pos().

t3lib_cs::utf8_strrpos ( haystack,
needle 
)

Find position of last occurrence of a char in a string, both arguments are in UTF-8.

Parameters:
string UTF-8 string to search in
string UTF-8 character to search for (single character)
Returns:
integer The character position
See also:
strrpos()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1762 of file class.t3lib_cs.php.

References utf8_byte2char_pos().

t3lib_cs::utf8_char2byte_pos ( str,
pos 
)

Translates a character position into an 'absolute' byte position. Unit tested by Kasper.

Parameters:
string UTF-8 string
integer Character position (negative values start from the end)
Returns:
integer Byte position
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1784 of file class.t3lib_cs.php.

References strlen().

Referenced by crop(), utf8_strpos(), and utf8_substr().

t3lib_cs::utf8_byte2char_pos ( str,
pos 
)

Translates an 'absolute' byte position into a character position. Unit tested by Kasper.

Parameters:
string UTF-8 string
integer byte position
Returns:
integer character position
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1825 of file class.t3lib_cs.php.

References strlen().

Referenced by utf8_strpos(), and utf8_strrpos().

t3lib_cs::utf8_char_mapping ( str,
mode,
opt = '' 
)

Maps all characters of an UTF-8 string.

Parameters:
string UTF-8 string
string mode: 'case' (case folding) or 'ascii' (ASCII transliteration)
string 'case': conversion 'toLower' or 'toUpper'
Returns:
string the converted string
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1848 of file class.t3lib_cs.php.

References initUnicodeData(), strlen(), and substr().

Referenced by conv_case(), and specCharsToASCII().

t3lib_cs::euc_strtrunc ( str,
len,
charset 
)

Cuts a string in the EUC charset family short at a given byte length.

Parameters:
string EUC multibyte character string
integer the byte length
string the charset
Returns:
string the shortened string
See also:
mb_strcut()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1924 of file class.t3lib_cs.php.

References strlen(), and substr().

Referenced by strtrunc().

t3lib_cs::euc_substr ( str,
start,
charset,
len = null 
)

Returns a part of a string in the EUC charset family.

Parameters:
string EUC multibyte character string
integer start position (character position)
string the charset
integer length (in characters)
Returns:
string the substring
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1953 of file class.t3lib_cs.php.

References euc_char2byte_pos(), and substr().

Referenced by substr().

t3lib_cs::euc_strlen ( str,
charset 
)

Counts the number of characters of a string in the EUC charset family.

Parameters:
string EUC multibyte character string
string the charset
Returns:
integer the number of characters
See also:
strlen()
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 1978 of file class.t3lib_cs.php.

References strlen().

Referenced by strlen().

t3lib_cs::euc_char2byte_pos ( str,
pos,
charset 
)

Translates a character position into an 'absolute' byte position.

Parameters:
string EUC multibyte character string
integer character position (negative values start from the end)
string the charset
Returns:
integer byte position
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 2005 of file class.t3lib_cs.php.

References strlen().

Referenced by crop(), and euc_substr().

t3lib_cs::euc_char_mapping ( str,
charset,
mode,
opt = '' 
)

Maps all characters of a string in the EUC charset family.

Parameters:
string EUC multibyte character string
string the charset
string mode: 'case' (case folding) or 'ascii' (ASCII transliteration)
string 'case': conversion 'toLower' or 'toUpper'
Returns:
string the converted string
Author:
Martin Kutschker <martin.t.kutschker@blackbox.net>

Definition at line 2046 of file class.t3lib_cs.php.

References initCaseFolding(), initToASCII(), strlen(), and substr().

Referenced by conv_case(), and specCharsToASCII().


Member Data Documentation

t3lib_cs::$twoByteSets

Initial value:

array(
                'ucs-2'=>1,     // 2-byte Unicode
        )

Definition at line 149 of file class.t3lib_cs.php.

t3lib_cs::$fourByteSets

Initial value:

array(
                'ucs-4'=>1,     // 4-byte Unicode
                'utf-32'=>1,    // 4-byte Unicode (limited to the 21-bits of UTF-16)
        )

Definition at line 154 of file class.t3lib_cs.php.

t3lib_cs::$eucBasedSets

Initial value:

array(
                'gb2312'=>1,            // Chinese, simplified.
                'big5'=>1,              // Chinese, traditional.
                'euc-kr'=>1,            // Korean
                'shift_jis'=>1,         // Japanese - WARNING: Shift-JIS includes half-width katakana single-bytes characters above 0x80!
        )

Definition at line 160 of file class.t3lib_cs.php.

t3lib_cs::$script_to_charset_unix

Initial value:

array(
                'west_european' => 'iso-8859-1',
                'estonian' => 'iso-8859-1',
                'east_european' => 'iso-8859-2',
                'baltic' => 'iso-8859-4',
                'cyrillic' => 'iso-8859-5',
                'arabic' => 'iso-8859-6',
                'greek' => 'iso-8859-7',
                'hebrew' => 'iso-8859-8',
                'turkish' => 'iso-8859-9',
                'thai' => 'iso-8859-11', // = TIS-620
                'lithuanian' => 'iso-8859-13',
                'chinese' => 'gb2312', // = euc-cn
                'japanese' => 'euc-jp',
                'korean' => 'euc-kr',
                'simpl_chinese' => 'gb2312',
                'trad_chinese' => 'big5',
                'vietnamese' => '',
                'unicode' => 'utf-8',
        )

Definition at line 398 of file class.t3lib_cs.php.

t3lib_cs::$script_to_charset_windows

Initial value:

array(
                'east_european' => 'windows-1250',
                'cyrillic' => 'windows-1251',
                'west_european' => 'windows-1252',
                'greek' => 'windows-1253',
                'turkish' => 'windows-1254',
                'hebrew' => 'windows-1255',
                'arabic' => 'windows-1256',
                'baltic' => 'windows-1257',
                'estonian' => 'windows-1257',
                'lithuanian' => 'windows-1257',
                'vietnamese' => 'windows-1258',
                'thai' => 'cp874',
                'korean' => 'cp949',
                'chinese' => 'gb2312',
                'japanese' => 'shift_jis',
                'simpl_chinese' => 'gb2312',
                'trad_chinese' => 'big5',
        )

Definition at line 420 of file class.t3lib_cs.php.

t3lib_cs::$locale_to_charset

Initial value:

array(
                'japanese.euc' => 'euc-jp',
                'ja_jp.ujis' => 'euc-jp',
                'korean.euc' => 'euc-kr',
                'sr@Latn' => 'iso-8859-2',
                'zh_cn' => 'gb2312',
                'zh_hk' => 'big5',
                'zh_tw' => 'big5',
        )

Definition at line 441 of file class.t3lib_cs.php.

t3lib_cs::$isoArray

Initial value:

 array(
                'ba' => 'bs',
                'br' => 'pt_BR',
                'ch' => 'zh_CN',
                'cz' => 'cs',
                'dk' => 'da',
                'si' => 'sl',
                'se' => 'sv',
                'gl' => 'kl',
                'gr' => 'el',
                'hk' => 'zh_HK',
                'kr' => 'ko',
                'ua' => 'uk',
                'jp' => 'ja',
                'vn' => 'vi',
        )

Definition at line 503 of file class.t3lib_cs.php.


The documentation for this class was generated from the following file:


Généré par L'expert TYPO3 avec  doxygen 1.4.6