"TYPO3 4.0.1: tx_indexedsearch_indexer Class Reference", "datetime" => "Sat Dec 2 19:27:14 2006", "date" => "2 Dec 2006", "doxygenversion" => "1.4.6", "projectname" => "TYPO3 4.0.1", "projectnumber" => "4.0.1" ); get_header($doxygen_vars); ?>
Public Member Functions | |
hook_indexContent (&$pObj) | |
backend_initIndexer ($id, $type, $sys_language_uid, $MP, $uidRL, $cHash_array=array(), $createCHash=FALSE) | |
backend_setFreeIndexUid ($freeIndexUid, $freeIndexSetId=0) | |
backend_indexAsTYPO3Page ($title, $keywords, $description, $content, $charset, $mtime, $crdate=0, $recordUid=0) | |
init () | |
initializeExternalParsers () | |
indexTypo3PageContent () | |
splitHTMLContent ($content) | |
getHTMLcharset ($content) | |
convertHTMLToUtf8 ($content, $charset='') | |
embracingTags ($string, $tagName, &$tagContent, &$stringAfter, &$paramList) | |
typoSearchTags (&$body) | |
extractLinks ($content) | |
extractHyperLinks ($string) | |
indexExternalUrl ($externalUrl) | |
getUrlHeaders ($url) | |
indexRegularDocument ($file, $force=FALSE, $contentTmpFile='', $altExtension='') | |
readFileContent ($ext, $absFile, $cPKey) | |
fileContentParts ($ext, $absFile) | |
splitRegularContent ($content) | |
charsetEntity2utf8 (&$contentArr, $charset) | |
processWordsInArrays ($contentArr) | |
procesWordsInArrays ($contentArr) | |
bodyDescription ($contentArr) | |
indexAnalyze ($content) | |
analyzeHeaderinfo (&$retArr, $content, $key, $offset) | |
analyzeBody (&$retArr, $content) | |
metaphone ($word, $retRaw=FALSE) | |
submitPage () | |
submit_grlist ($hash, $phash_x) | |
submit_section ($hash, $hash_t3) | |
removeOldIndexedPages ($phash) | |
submitFilePage ($hash, $file, $subinfo, $ext, $mtime, $ctime, $size, $content_md5h, $contentParts) | |
submitFile_grlist ($hash) | |
submitFile_section ($hash) | |
removeOldIndexedFiles ($phash) | |
checkMtimeTstamp ($mtime, $phash) | |
checkContentHash () | |
checkExternalDocContentHash ($hashGr, $content_md5h) | |
is_grlist_set ($phash_x) | |
update_grlist ($phash, $phash_x) | |
updateTstamp ($phash, $mtime=0) | |
updateSetId ($phash) | |
updateParsetime ($phash, $parsetime) | |
updateRootline () | |
getRootLineFields (&$fieldArr) | |
removeLoginpagesWithContentHash () | |
includeCrawlerClass () | |
checkWordList ($wl) | |
submitWords ($wl, $phash) | |
freqMap ($freq) | |
setT3Hashes () | |
setExtHashes ($file, $subinfo=array()) | |
md5inthash ($str) | |
makeCHash ($paramArray) | |
log_push ($msg, $key) | |
log_pull () | |
log_setTSlogMessage ($msg, $errorNum=0) | |
fe_headerNoCache (&$params, $ref) | |
Public Attributes | |
$reasons | |
$excludeSections = 'script,style' | |
$external_parsers = array() | |
$defaultGrList = '0,-1' | |
$tstamp_maxAge = 0 | |
$tstamp_minAge = 0 | |
$maxExternalFiles = 0 | |
$forceIndexing = FALSE | |
$crawlerActive = FALSE | |
$defaultContentArray | |
$wordcount = 0 | |
$externalFileCounter = 0 | |
$conf = array() | |
$indexerConfig = array() | |
$hash = array() | |
$file_phash_arr = array() | |
$contentParts = array() | |
$content_md5h = '' | |
$internal_log = array() | |
$indexExternalUrl_content = '' | |
$cHashParams = array() | |
$freqRange = 32000 | |
$freqMax = 0.1 | |
$csObj | |
$metaphoneObj | |
$lexerObj |
Definition at line 141 of file class.indexer.php.
|
Calculates relevant information for bodycontent
Definition at line 1257 of file class.indexer.php. |
|
Calculates relevant information for headercontent
Definition at line 1238 of file class.indexer.php. |
|
Indexing records as the content of a TYPO3 page.
Definition at line 365 of file class.indexer.php. References indexTypo3PageContent(). |
|
Initializing the "combined ID" of the page (phash) being indexed (or for which external media is attached)
Definition at line 308 of file class.indexer.php. References init(), and makeCHash(). |
|
Sets the free-index uid. Can be called right after backend_initIndexer()
Definition at line 347 of file class.indexer.php. |
|
Extracts the sample description text from the content array.
Definition at line 1195 of file class.indexer.php. References t3lib_div::intInRange(). |
|
Convert character set and HTML entities in the value of input content array keys
Definition at line 1137 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Check content hash in phash table
Definition at line 1640 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Check content hash for external documents Returns true if the document needs to be indexed (that is, there was no result)
Definition at line 1657 of file class.indexer.php. |
|
Check the mtime / tstamp of the currently indexed page/file (based on phash) Return positive integer if the page needs to being indexed!
Definition at line 1604 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Adds new words to db
Definition at line 1820 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Converts a HTML document to utf-8
Definition at line 657 of file class.indexer.php. |
|
Finds first occurence of embracing tags and returns the embraced content and the original string with the tag removed in the two passed variables. Returns false if no match found. ie. useful for finding <title> of document or removing <script>-sections
Definition at line 685 of file class.indexer.php. Referenced by splitHTMLContent(). |
|
Extracts all links to external documents from content string.
Definition at line 827 of file class.indexer.php. References t3lib_div::makeInstance(), and t3lib_div::shortMD5(). |
|
Extract links (hrefs) from HTML content and if indexable media is found, it is indexed.
Definition at line 741 of file class.indexer.php. References t3lib_div::getFileAbsFileName(), t3lib_div::htmlspecialchars_decode(), t3lib_div::isAllowedAbsPath(), t3lib_extMgm::isLoaded(), and t3lib_div::makeInstance(). Referenced by indexTypo3PageContent(). |
|
Frontend hook: If the page is not being re-generated this is our chance to force it to be (because re-generation of the page is required in order to have the indexer called!)
Definition at line 2051 of file class.indexer.php. References t3lib_extMgm::isLoaded(). |
|
Creates an array with pointers to divisions of document.
Definition at line 1086 of file class.indexer.php. |
|
maps frequency from a real number in [0;1] to an integer in [0;$this->freqRange] with anything above $this->freqMax as 1 and back.
Definition at line 1881 of file class.indexer.php. |
|
Extract the charset value from HTML meta tag.
Definition at line 642 of file class.indexer.php. |
|
Adding values for root-line fields. rl0, rl1 and rl2 are standard. A hook might add more.
Definition at line 1757 of file class.indexer.php. |
|
Getting HTTP request headers of URL
Definition at line 917 of file class.indexer.php. References t3lib_div::getURL(), and t3lib_div::trimExplode(). |
|
Parent Object (TSFE) Initialization
Definition at line 207 of file class.indexer.php. References $indexerConfig, indexTypo3PageContent(), init(), t3lib_extMgm::isLoaded(), log_pull(), log_push(), and log_setTSlogMessage(). |
|
Includes the crawler class
Definition at line 1793 of file class.indexer.php. References t3lib_extMgm::extPath(). |
|
Analyzes content to use for indexing,
Definition at line 1217 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Index External URLs HTML content
Definition at line 886 of file class.indexer.php. References t3lib_div::tempnam(), and t3lib_div::writeFile(). |
|
Indexing a regular document given as $file (relative to PATH_site, local file)
Definition at line 963 of file class.indexer.php. References t3lib_div::getFileAbsFileName(), t3lib_div::isAbsPath(), t3lib_div::isAllowedAbsPath(), and t3lib_div::milliseconds(). |
|
Start indexing of the TYPO3 page
Definition at line 509 of file class.indexer.php. References charsetEntity2utf8(), checkContentHash(), checkMtimeTstamp(), checkWordList(), extractLinks(), indexAnalyze(), is_grlist_set(), log_pull(), log_push(), log_setTSlogMessage(), md5inthash(), t3lib_div::milliseconds(), processWordsInArrays(), splitHTMLContent(), submitPage(), submitWords(), update_grlist(), updateParsetime(), updateRootline(), updateSetId(), and updateTstamp(). Referenced by backend_indexAsTYPO3Page(), and hook_indexContent(). |
|
Initializes the object. $this->conf MUST be set with proper values prior to this call!!!
Definition at line 416 of file class.indexer.php. References t3lib_div::getUserObj(), initializeExternalParsers(), t3lib_div::intInRange(), t3lib_div::makeInstance(), metaphone(), and setT3Hashes(). Referenced by backend_initIndexer(), and hook_indexContent(). |
|
Initialize external parsers
Definition at line 468 of file class.indexer.php. References t3lib_div::getUserObj(). Referenced by init(). |
|
Checks if a grlist record has been set for the phash value input (looking at the "real" phash of the current content, not the linked-to phash of the common search result page)
Definition at line 1671 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Pull function wrapper for TT logging
Definition at line 2015 of file class.indexer.php. Referenced by hook_indexContent(), and indexTypo3PageContent(). |
|
Push function wrapper for TT logging
Definition at line 2006 of file class.indexer.php. Referenced by hook_indexContent(), and indexTypo3PageContent(). |
|
Set log message function wrapper for TT logging
Definition at line 2026 of file class.indexer.php. Referenced by hook_indexContent(), and indexTypo3PageContent(). |
|
Calculates the cHash value of input GET array (for constructing cHash values if needed)
Definition at line 1974 of file class.indexer.php. References t3lib_div::cHashParams(), t3lib_div::implodeArrayForUrl(), and t3lib_div::shortMD5(). Referenced by backend_initIndexer(). |
|
md5 integer hash Using 7 instead of 8 just because that makes the integers lower than 32 bit (28 bit) and so they do not interfere with UNSIGNED integers or PHP-versions which has varying output from the hexdec function.
Definition at line 1964 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Creating metaphone based hash from input word
Definition at line 1277 of file class.indexer.php. Referenced by init(). |
|
Processing words in the array from split*Content -functions
Definition at line 1160 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Processing words in the array from split*Content -functions This function is only a wrapper because the function has been removed (see above).
Definition at line 1185 of file class.indexer.php. |
|
Reads the content of an external file being indexed. The content from the external parser MUST be returned in utf-8!
Definition at line 1069 of file class.indexer.php. |
|
Removes any indexed pages with userlogins which has the same contentHash NOT USED anywhere inside this class!
Definition at line 1776 of file class.indexer.php. |
|
Removes records for the indexed page, $phash
Definition at line 1568 of file class.indexer.php. |
|
Removes records for the indexed page, $phash
Definition at line 1431 of file class.indexer.php. |
|
Get search hash, external files
Definition at line 1940 of file class.indexer.php. |
|
Get search hash, T3 pages
Definition at line 1914 of file class.indexer.php. Referenced by init(). |
|
Splits HTML content and returns an associative array, with title, a list of metatags, and a list of words in the body.
Definition at line 596 of file class.indexer.php. References embracingTags(), and t3lib_div::get_tag_attributes(). Referenced by indexTypo3PageContent(). |
|
Splits non-HTML content (from external files for instance)
Definition at line 1104 of file class.indexer.php. |
|
Stores gr_list in the database.
Definition at line 1393 of file class.indexer.php. |
|
Stores section $hash and $hash_t3 are the same for TYPO3 pages, but different when it is external files.
Definition at line 1413 of file class.indexer.php. |
|
Stores file gr_list for a file IF it does not exist already
Definition at line 1540 of file class.indexer.php. |
|
Stores file section for a file IF it does not exist
Definition at line 1554 of file class.indexer.php. |
|
Updates db with information about the file
Definition at line 1474 of file class.indexer.php. |
|
Updates db with information about the page (TYPO3 page, not external media)
Definition at line 1319 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Submits RELATIONS between words and phash
Definition at line 1857 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Removes content that shouldn't be indexed according to TYPO3SEARCH-tags.
Definition at line 712 of file class.indexer.php. |
|
Check if an grlist-entry for this hash exists and if not so, write one.
Definition at line 1684 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Update parsetime for phash row.
Definition at line 1729 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Update section rootline for the page
Definition at line 1742 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Update SetID of the index_phash record.
Definition at line 1714 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Update tstamp for a phash row.
Definition at line 1699 of file class.indexer.php. Referenced by indexTypo3PageContent(). |
|
Initial value: array( 'title' => '', 'description' => '', 'keywords' => '', 'body' => '', ) Definition at line 171 of file class.indexer.php. |
|
Initial value: array(
-1 => 'mtime matched the document, so no changes detected and no content updated',
-2 => 'The minimum age was not exceeded',
1 => "The configured max-age was exceeded for the document and thus it's indexed.",
2 => 'The minimum age was exceed and mtime was set and the mtime was different, so the page was indexed.',
3 => 'The minimum age was exceed, but mtime was not set, so the page was indexed.',
4 => 'Page has never been indexed (is not represented in the index_phash table).'
)
Definition at line 144 of file class.indexer.php. |