Documentation TYPO3 par Ameos |
Public Member Functions | |
hook_indexContent (&$pObj) | |
backend_initIndexer ($id, $type, $sys_language_uid, $MP, $uidRL, $cHash_array=array(), $createCHash=FALSE) | |
backend_setFreeIndexUid ($freeIndexUid) | |
backend_indexAsTYPO3Page ($title, $keywords, $description, $content, $charset, $mtime, $crdate=0, $recordUid=0) | |
init () | |
initializeExternalParsers () | |
indexTypo3PageContent () | |
splitHTMLContent ($content) | |
getHTMLcharset ($content) | |
convertHTMLToUtf8 ($content, $charset='') | |
embracingTags ($string, $tagName, &$tagContent, &$stringAfter, &$paramList) | |
typoSearchTags (&$body) | |
extractLinks ($content) | |
extractHyperLinks ($string) | |
indexExternalUrl ($externalUrl) | |
getUrlHeaders ($url, $timeout=2) | |
indexRegularDocument ($file, $force=FALSE, $contentTmpFile='', $altExtension='') | |
readFileContent ($ext, $absFile, $cPKey) | |
fileContentParts ($ext, $absFile) | |
splitRegularContent ($content) | |
charsetEntity2utf8 (&$contentArr, $charset) | |
procesWordsInArrays ($contentArr) | |
bodyDescription ($contentArr) | |
indexAnalyze ($content) | |
analyzeHeaderinfo (&$retArr, $content, $key, $offset) | |
analyzeBody (&$retArr, $content) | |
metaphone ($word, $retRaw=FALSE) | |
submitPage () | |
submit_grlist ($hash, $phash_x) | |
submit_section ($hash, $hash_t3) | |
removeOldIndexedPages ($phash) | |
submitFilePage ($hash, $file, $subinfo, $ext, $mtime, $ctime, $size, $content_md5h, $contentParts) | |
submitFile_grlist ($hash) | |
submitFile_section ($hash) | |
removeOldIndexedFiles ($phash) | |
checkMtimeTstamp ($mtime, $phash) | |
checkContentHash () | |
checkExternalDocContentHash ($hashGr, $content_md5h) | |
is_grlist_set ($phash_x) | |
update_grlist ($phash, $phash_x) | |
updateTstamp ($phash, $mtime=0) | |
updateParsetime ($phash, $parsetime) | |
updateRootline () | |
getRootLineFields (&$fieldArr) | |
removeLoginpagesWithContentHash () | |
checkWordList ($wl) | |
submitWords ($wl, $phash) | |
freqMap ($freq) | |
setT3Hashes () | |
setExtHashes ($file, $subinfo=array()) | |
md5inthash ($str) | |
makeCHash ($paramArray) | |
log_push ($msg, $key) | |
log_pull () | |
log_setTSlogMessage ($msg, $errorNum=0) | |
fe_headerNoCache (&$params, $ref) | |
Public Attributes | |
$reasons | |
$excludeSections = 'script,style' | |
$external_parsers = array() | |
$defaultGrList = '0,-1' | |
$tstamp_maxAge = 0 | |
$tstamp_minAge = 0 | |
$maxExternalFiles = 0 | |
$forceIndexing = FALSE | |
$crawlerActive = FALSE | |
$defaultContentArray | |
$wordcount = 0 | |
$externalFileCounter = 0 | |
$conf = array() | |
$indexerConfig = array() | |
$hash = array() | |
$file_phash_arr = array() | |
$contentParts = array() | |
$content_md5h = '' | |
$internal_log = array() | |
$indexExternalUrl_content = '' | |
$cHashParams = array() | |
$freqRange = 32000 | |
$freqMax = 0.1 | |
$csObj | |
$metaphoneObj | |
$lexerObj |
Definition at line 138 of file class.indexer.php.
|
Calculates relevant information for bodycontent
Definition at line 1200 of file class.indexer.php. |
|
Calculates relevant information for headercontent
Definition at line 1181 of file class.indexer.php. |
|
Indexing records as the content of a TYPO3 page.
Definition at line 357 of file class.indexer.php. References indexTypo3PageContent(). |
|
Initializing the "combined ID" of the page (phash) being indexed (or for which external media is attached)
Definition at line 303 of file class.indexer.php. References init(), and makeCHash(). |
|
Sets the free-index uid. Can be called right after backend_initIndexer()
Definition at line 340 of file class.indexer.php. |
|
Extracts the sample description text from the content array.
Definition at line 1138 of file class.indexer.php. References t3lib_div::intInRange(). |
|
Convert character set and HTML entities in the value of input content array keys
Definition at line 1092 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Check content hash in phash table
Definition at line 1574 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Check content hash for external documents Returns true if the document needs to be indexed (that is, there was no result)
Definition at line 1591 of file class.indexer.php. |
|
Check the mtime / tstamp of the currently indexed page/file (based on phash) Return positive integer if the page needs to being indexed!
Definition at line 1538 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Adds new words to db
Definition at line 1731 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Converts a HTML document to utf-8
Definition at line 648 of file class.indexer.php. |
|
Finds first occurence of embracing tags and returns the embraced content and the original string with the tag removed in the two passed variables. Returns false if no match found. ie. useful for finding <title> of document or removing <script>-sections
Definition at line 676 of file class.indexer.php. Referenced by splitHTMLContent(). |
|
Extracts all links to external documents from content string.
Definition at line 775 of file class.indexer.php. References t3lib_div::makeInstance(). |
|
Extract links (hrefs) from HTML content and if indexable media is found, it is indexed.
Definition at line 732 of file class.indexer.php. References t3lib_div::getFileAbsFileName(), and t3lib_div::htmlspecialchars_decode(). Referenced by indexTypo3PageContent(). |
|
Frontend hook: If the page is not being re-generated this is our chance to force it to be (because re-generation of the page is required in order to have the indexer called!)
Definition at line 1962 of file class.indexer.php. References t3lib_extMgm::isLoaded(). |
|
Creates an array with pointers to divisions of document.
Definition at line 1041 of file class.indexer.php. |
|
maps frequency from a real number in [0;1] to an integer in [0;$this->freqRange] with anything above $this->freqMax as 1 and back.
Definition at line 1792 of file class.indexer.php. |
|
Extract the charset value from HTML meta tag.
Definition at line 633 of file class.indexer.php. |
|
Adding values for root-line fields. rl0, rl1 and rl2 are standard. A hook might add more.
Definition at line 1677 of file class.indexer.php. |
|
Getting HTTP request headers of URL
Definition at line 858 of file class.indexer.php. References t3lib_div::trimExplode(). |
|
Parent Object (TSFE) Initialization
Definition at line 204 of file class.indexer.php. References $indexerConfig, indexTypo3PageContent(), init(), t3lib_extMgm::isLoaded(), log_pull(), log_push(), and log_setTSlogMessage(). |
|
Analyzes content to use for indexing,
Definition at line 1160 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Index External URLs HTML content
Definition at line 827 of file class.indexer.php. References t3lib_div::tempnam(), and t3lib_div::writeFile(). |
|
Indexing a regular document given as $file (relative to PATH_site, local file)
Definition at line 918 of file class.indexer.php. References t3lib_div::getFileAbsFileName(), t3lib_div::isAbsPath(), t3lib_div::isAllowedAbsPath(), and t3lib_div::milliseconds(). |
|
Start indexing of the TYPO3 page
Definition at line 501 of file class.indexer.php. References charsetEntity2utf8(), checkContentHash(), checkMtimeTstamp(), checkWordList(), extractLinks(), indexAnalyze(), is_grlist_set(), log_pull(), log_push(), log_setTSlogMessage(), md5inthash(), t3lib_div::milliseconds(), procesWordsInArrays(), splitHTMLContent(), submitPage(), submitWords(), update_grlist(), updateParsetime(), updateRootline(), and updateTstamp(). Referenced by backend_indexAsTYPO3Page(), and hook_indexContent(). |
|
Initializes the object. $this->conf MUST be set with proper values prior to this call!!!
Definition at line 408 of file class.indexer.php. References t3lib_div::getUserObj(), initializeExternalParsers(), t3lib_div::intInRange(), t3lib_div::makeInstance(), metaphone(), and setT3Hashes(). Referenced by backend_initIndexer(), and hook_indexContent(). |
|
Initialize external parsers
Definition at line 460 of file class.indexer.php. References t3lib_div::getUserObj(). Referenced by init(). |
|
Checks if a grlist record has been set for the phash value input (looking at the "real" phash of the current content, not the linked-to phash of the common search result page)
Definition at line 1605 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Pull function wrapper for TT logging
Definition at line 1926 of file class.indexer.php. Referenced by hook_indexContent(), and indexTypo3PageContent(). |
|
Push function wrapper for TT logging
Definition at line 1917 of file class.indexer.php. Referenced by hook_indexContent(), and indexTypo3PageContent(). |
|
Set log message function wrapper for TT logging
Definition at line 1937 of file class.indexer.php. Referenced by hook_indexContent(), and indexTypo3PageContent(). |
|
Calculates the cHash value of input GET array (for constructing cHash values if needed)
Definition at line 1885 of file class.indexer.php. References t3lib_div::cHashParams(), t3lib_div::implodeArrayForUrl(), and t3lib_div::shortMD5(). Referenced by backend_initIndexer(). |
|
md5 integer hash Using 7 instead of 8 just because that makes the integers lower than 32 bit (28 bit) and so they do not interfere with UNSIGNED integers or PHP-versions which has varying output from the hexdec function.
Definition at line 1875 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Creating metaphone based hash from input word
Definition at line 1220 of file class.indexer.php. Referenced by init(). |
|
Processing words in the array from split*Content -functions
Definition at line 1115 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Reads the content of an external file being indexed. The content from the external parser MUST be returned in utf-8!
Definition at line 1024 of file class.indexer.php. |
|
Removes any indexed pages with userlogins which has the same contentHash NOT USED anywhere inside this class!
Definition at line 1696 of file class.indexer.php. |
|
Removes records for the indexed page, $phash
Definition at line 1502 of file class.indexer.php. |
|
Removes records for the indexed page, $phash
Definition at line 1369 of file class.indexer.php. |
|
Get search hash, external files
Definition at line 1851 of file class.indexer.php. |
|
Get search hash, T3 pages
Definition at line 1825 of file class.indexer.php. Referenced by init(). |
|
Splits HTML content and returns an associative array, with title, a list of metatags, and a list of words in the body.
Definition at line 587 of file class.indexer.php. References embracingTags(), and t3lib_div::get_tag_attributes(). Referenced by indexTypo3PageContent(). |
|
Splits non-HTML content (from external files for instance)
Definition at line 1059 of file class.indexer.php. |
|
Stores gr_list in the database.
Definition at line 1331 of file class.indexer.php. |
|
Stores section $hash and $hash_t3 are the same for TYPO3 pages, but different when it is external files.
Definition at line 1351 of file class.indexer.php. |
|
Stores file gr_list for a file IF it does not exist already
Definition at line 1474 of file class.indexer.php. |
|
Stores file section for a file IF it does not exist
Definition at line 1488 of file class.indexer.php. |
|
Updates db with information about the file
Definition at line 1412 of file class.indexer.php. |
|
Updates db with information about the page (TYPO3 page, not external media)
Definition at line 1262 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Submits RELATIONS between words and phash
Definition at line 1768 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Removes content that shouldn't be indexed according to TYPO3SEARCH-tags.
Definition at line 703 of file class.indexer.php. |
|
Check if an grlist-entry for this hash exists and if not so, write one.
Definition at line 1618 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Update parsetime for phash row.
Definition at line 1649 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Update section rootline for the page
Definition at line 1662 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Update tstamp for a phash row.
Definition at line 1633 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Initial value: array( 'title' => '', 'description' => '', 'keywords' => '', 'body' => '', ) Definition at line 168 of file class.indexer.php. |
|
Initial value: array(
-1 => 'mtime matched the document, so no changes detected and no content updated',
-2 => 'The minimum age was not exceeded',
1 => "The configured max-age was exceeded for the document and thus it's indexed.",
2 => 'The minimum age was exceed and mtime was set and the mtime was different, so the page was indexed.',
3 => 'The minimum age was exceed, but mtime was not set, so the page was indexed.',
4 => 'Page has never been indexed (is not represented in the index_phash table).'
)
Definition at line 141 of file class.indexer.php. |