Documentation TYPO3 par Ameos |
Public Member Functions | |
hook_indexContent (&$pObj) | |
init () | |
initExternalReaders () | |
indexTypo3PageContent () | |
splitHTMLContent ($content) | |
bodyDescription ($contentArr) | |
extractLinks ($content) | |
getJumpurl ($query) | |
splitPdfInfo ($pdfInfoArray) | |
indexRegularDocument ($file) | |
readFileContent ($ext, $absFile, $cPKey) | |
fileContentParts ($ext, $absFile) | |
embracingTags ($string, $tagName, &$tagContent, &$stringAfter, &$paramList) | |
indexAnalyze ($content) | |
analyzeHeaderinfo (&$retArr, $content, $key, $offset) | |
analyzeBody (&$retArr, $content) | |
typoSearchTags (&$body) | |
split2words (&$string) | |
wordOK ($w) | |
metaphone ($word) | |
strtolower_all ($str) | |
freqMap ($freq) | |
getRootLineFields (&$fieldArr) | |
removeIndexedPhashRow ($phashList, $clearPageCache=1) | |
checkMtimeTstamp ($mtime, $maxAge, $minAge, $phash) | |
update_grlist ($phash, $phash_x) | |
is_grlist_set ($phash_x) | |
checkContentHash () | |
removeLoginpagesWithContentHash () | |
removeOldIndexedPages ($phash) | |
checkExternalDocContentHash ($hashGr, $content_md5h) | |
updateTstamp ($phash, $mtime=0) | |
updateParsetime ($phash, $parsetime) | |
updateRootline () | |
submitPage () | |
submit_grlist ($hash, $phash_x) | |
submit_section ($hash, $hash_t3) | |
submitFilePage ($hash, $file, $subinfo, $ext, $mtime, $ctime, $size, $content_md5h, $contentParts) | |
submitFile_grlist ($hash) | |
submitFile_section ($hash) | |
checkWordList ($wl) | |
submitWords ($wl, $phash) | |
setT3Hashes () | |
setExtHashes ($file, $subinfo=array()) | |
md5inthash ($str) | |
Public Attributes | |
$reasons | |
$convChars | |
$excludeSections = 'script,style' | |
$supportedExtensions | |
$pdf_mode = -20 | |
$app | |
$defaultGrList = '0,-1' | |
$tstamp_maxAge = 0 | |
$tstamp_minAge = 0 | |
$defaultContentArray | |
$wordcount = 0 | |
$Itypes | |
$conf = array() | |
$hash = array() | |
$contentParts = array() | |
$pObj = '' | |
$content_md5h = '' | |
$cHashParams = array() | |
$mtime = 0 | |
$rootLine = array() | |
$freqRange = 65000 | |
$freqMax = 0.1 |
Definition at line 118 of file class.indexer.php.
|
Calculates relevant information for bodycontent
Definition at line 820 of file class.indexer.php. |
|
Calculates relevant information for headercontent
Definition at line 801 of file class.indexer.php. |
|
Returns bodyDescription
Definition at line 482 of file class.indexer.php. References t3lib_div::intInRange(). |
|
Check content hash Returns true if the page needs to be indexed (that is, there was no result)
Definition at line 1140 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Check content hash for external documents Returns true if the document needs to be indexed (that is, there was no result)
Definition at line 1190 of file class.indexer.php. |
|
Check the mtime / tstamp of the currently indexed page/file (based on phash) Return positive integer if the page needs to being indexed!
Definition at line 1083 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Adds new words to db
Definition at line 1436 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Finds first occurence of embracing tags and returns the embraced content and the original string with the tag removed in the two passed variables. Returns false if no match found. ie. useful for finding <title> of document or removing <script>-sections
Definition at line 754 of file class.indexer.php. Referenced by splitHTMLContent(). |
|
extract links and if indexable media is found, it is indexed
Definition at line 499 of file class.indexer.php. References t3lib_div::htmlspecialchars_decode(), and t3lib_div::makeInstance(). Referenced by indexTypo3PageContent(). |
|
[Describe function...]
Definition at line 711 of file class.indexer.php. References t3lib_div::intInRange(). |
|
maps frequency from a real number in [0;1] to an integer in [0;$this->freqRange] with anything above $this->freqMax as 1 and back.
Definition at line 985 of file class.indexer.php. |
|
[Describe function...]
Definition at line 531 of file class.indexer.php. |
|
[Describe function...]
Definition at line 1003 of file class.indexer.php. |
|
Parent Object (TSFE)
Definition at line 200 of file class.indexer.php. References $pObj, indexTypo3PageContent(), and init(). |
|
Analyzes content to use for indexing, the parameter must be an array with the keys title,keywords,description and body, which all contain an array of words.
Definition at line 780 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Indexing a regular document given as $file (relative to PATH_site, local file)
Definition at line 564 of file class.indexer.php. References t3lib_div::milliseconds(). |
|
Start indexing of the TYPO3 page
Definition at line 325 of file class.indexer.php. References checkContentHash(), checkMtimeTstamp(), checkWordList(), extractLinks(), indexAnalyze(), md5inthash(), t3lib_div::milliseconds(), splitHTMLContent(), submitPage(), submitWords(), update_grlist(), updateParsetime(), updateRootline(), and updateTstamp(). Referenced by hook_indexContent(). |
|
Initializes the object
Definition at line 242 of file class.indexer.php. References initExternalReaders(), and setT3Hashes(). Referenced by hook_indexContent(). |
|
Initializes external readers, if any
Definition at line 271 of file class.indexer.php. References t3lib_div::intInRange(). Referenced by init(). |
|
Definition at line 1129 of file class.indexer.php. |
|
md5 integer hash
Definition at line 1563 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
metaphone
Definition at line 942 of file class.indexer.php. |
|
[Describe function...]
Definition at line 647 of file class.indexer.php. References t3lib_div::tempnam(). |
|
Removes ALL data regarding a certain indexed phash-row
Definition at line 1043 of file class.indexer.php. References t3lib_div::trimExplode(). |
|
Removes any indexed pages with userlogins which has the same contentHash
Definition at line 1154 of file class.indexer.php. |
|
Removes records for the indexed page, $phash
Definition at line 1172 of file class.indexer.php. |
|
Get search hash, external files
Definition at line 1540 of file class.indexer.php. |
|
Get search hash, T3 pages
Definition at line 1517 of file class.indexer.php. Referenced by init(). |
|
Splits the incoming string into words The $string parameter is a reference and will be made into an array!
Definition at line 891 of file class.indexer.php. |
|
Splits HTML content and returns an associative array, with title, a list of metatags, and a list of words in the body.
Definition at line 400 of file class.indexer.php. References embracingTags(), and t3lib_div::get_tag_attributes(). Referenced by indexTypo3PageContent(). |
|
Splitting PDF info
Definition at line 544 of file class.indexer.php. |
|
Converts string-to-lower including special characters.
Definition at line 954 of file class.indexer.php. |
|
Stores gr_list
Definition at line 1317 of file class.indexer.php. |
|
Stores section
Definition at line 1335 of file class.indexer.php. |
|
Stores file gr_list for a file IF it does not exist
Definition at line 1402 of file class.indexer.php. |
|
Stores file section for a file IF it does not exist
Definition at line 1419 of file class.indexer.php. |
|
Updates db with information about the file
Definition at line 1361 of file class.indexer.php. |
|
Updates db with information about the page
Definition at line 1264 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Submits information about words on the page to the db
Definition at line 1473 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Removes content that shouldn't be indexed according to TYPO3SEARCH-tags.
Definition at line 840 of file class.indexer.php. |
|
Check if an grlist-entry for this hash exists and if not so, write one.
Definition at line 1117 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Update parsetime
Definition at line 1221 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Update section rootline for the page
Definition at line 1234 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Update tstamp
Definition at line 1205 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Checks if a word is supposed to be indexed. This assessment includes that the word must be between 1 and 50 chars. The more exotic feature is that only 30 percent of the word must be non-alphanum characters. This is to exclude binary nonsense. This is done with a little trick it's counted how many chars are converted with a rawurlencode command. THis is not really an exact method, but I guess it's fast.
Definition at line 924 of file class.indexer.php. |
|
Initial value: array( 'pdftotext' => '/usr/local/bin/pdftotext', 'pdfinfo' => '/usr/local/bin/pdfinfo', 'catdoc' => '/usr/local/bin/catdoc' ) Definition at line 150 of file class.indexer.php. |
|
Initial value: array( 'ÁÉÚÍÄËÜÖÏÆØÅ', 'áéúíâêûôîæøå' ) Definition at line 129 of file class.indexer.php. |
|
Initial value: array( 'title' => '', 'description' => '', 'keywords' => '', 'body' => '', ) Definition at line 164 of file class.indexer.php. |
|
Initial value: array( 'html' => 1, 'htm' => 1, 'pdf' => 2, 'doc' => 3, 'txt' => 4 ) Definition at line 171 of file class.indexer.php. |
|
Initial value: array(
-1 => 'mtime matched the document, so no changes detected and no content updated',
-2 => 'The minimum age was not exceeded',
1 => "The configured max-age was exceeded for the document and thus it's indexed.",
2 => 'The minimum age was exceed and mtime was set and the mtime was different, so the page was indexed.',
3 => 'The minimum age was exceed, but mtime was not set, so the page was indexed.',
4 => 'Page has never been indexed (is not represented in the index_phash table).'
)
Definition at line 121 of file class.indexer.php. |
|
Initial value: array( 'pdf' => 1, 'doc' => 1, 'txt' => 1, 'html' => 1, 'htm' => 1 ) Definition at line 138 of file class.indexer.php. |