Documentation TYPO3 par Ameos |
Public Member Functions | |
crawler_init (&$pObj) | |
crawler_execute ($params, &$pObj) | |
crawler_execute_type1 ($cfgRec, &$session_data, $params, &$pObj) | |
crawler_execute_type2 ($cfgRec, &$session_data, $params, &$pObj) | |
crawler_execute_type3 ($cfgRec, &$session_data, $params, &$pObj) | |
crawler_execute_type4 ($cfgRec, &$session_data, $params, &$pObj) | |
cleanUpOldRunningConfigurations () | |
checkUrl ($url, $urlLog, $baseUrl) | |
indexExtUrl ($url, $pageId, $rl, $cfgUid, $setId) | |
indexSingleRecord ($r, $cfgRec, $rl=NULL) | |
loadIndexerClass () | |
getUidRootLineForClosestTemplate ($id) | |
generateNextIndexingTime ($cfgRec) | |
checkDeniedSuburls ($url, $url_deny) | |
addQueueEntryForHook ($cfgRec, $title) | |
deleteFromIndex ($id) | |
processCmdmap_preProcess ($command, $table, $id, $value, &$pObj) | |
processDatamap_afterDatabaseOperations ($status, $table, $id, $fieldArray, &$pObj) | |
Public Attributes | |
$secondsPerExternalUrl = 3 | |
$instanceCounter = 0 | |
$callBack = 'EXT:indexed_search/class.crawler.php:&tx_indexedsearch_crawler' |
Definition at line 87 of file class.crawler.php.
tx_indexedsearch_crawler::crawler_init | ( | &$ | pObj | ) |
Initialization of crawler hook. This function is asked for each instance of the crawler and we must check if something is timed to happen and if so put entry(s) in the crawlers log to start processing. In reality we select indexing configurations and evaluate if any of them needs to run.
object | Parent object (tx_crawler lib) |
Definition at line 106 of file class.crawler.php.
References cleanUpOldRunningConfigurations(), t3lib_BEfunc::deleteClause(), generateNextIndexingTime(), t3lib_div::getUserObj(), and t3lib_div::md5int().
tx_indexedsearch_crawler::crawler_execute | ( | $ | params, | |
&$ | pObj | |||
) |
Call back function for execution of a log element
array | Params from log element. Must contain $params['indexConfigUid'] | |
object | Parent object (tx_crawler lib) |
Definition at line 219 of file class.crawler.php.
References crawler_execute_type1(), crawler_execute_type2(), crawler_execute_type3(), crawler_execute_type4(), and t3lib_div::getUserObj().
tx_indexedsearch_crawler::crawler_execute_type1 | ( | $ | cfgRec, | |
&$ | session_data, | |||
$ | params, | |||
&$ | pObj | |||
) |
Indexing records from a table
array | Indexing Configuration Record | |
array | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! | |
array | Parameters from the log queue. | |
object | Parent object (from "crawler" extension!) |
Definition at line 285 of file class.crawler.php.
References t3lib_BEfunc::deleteClause(), getUidRootLineForClosestTemplate(), indexSingleRecord(), and t3lib_div::intInRange().
Referenced by crawler_execute().
tx_indexedsearch_crawler::crawler_execute_type2 | ( | $ | cfgRec, | |
&$ | session_data, | |||
$ | params, | |||
&$ | pObj | |||
) |
Indexing files from fileadmin
array | Indexing Configuration Record | |
array | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! | |
array | Parameters from the log queue. | |
object | Parent object (from "crawler" extension!) |
Definition at line 345 of file class.crawler.php.
References t3lib_div::get_dirs(), t3lib_div::getAllFilesAndFoldersInPath(), t3lib_div::getFileAbsFileName(), getUidRootLineForClosestTemplate(), t3lib_div::isAbsPath(), t3lib_div::isAllowedAbsPath(), loadIndexerClass(), t3lib_div::makeInstance(), t3lib_div::removePrefixPathFromList(), and t3lib_div::trimExplode().
Referenced by crawler_execute().
tx_indexedsearch_crawler::crawler_execute_type3 | ( | $ | cfgRec, | |
&$ | session_data, | |||
$ | params, | |||
&$ | pObj | |||
) |
Indexing External URLs
array | Indexing Configuration Record | |
array | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! | |
array | Parameters from the log queue. | |
object | Parent object (from "crawler" extension!) |
Definition at line 414 of file class.crawler.php.
References checkDeniedSuburls(), checkUrl(), getUidRootLineForClosestTemplate(), and indexExtUrl().
Referenced by crawler_execute().
tx_indexedsearch_crawler::crawler_execute_type4 | ( | $ | cfgRec, | |
&$ | session_data, | |||
$ | params, | |||
&$ | pObj | |||
) |
Page tree indexing type
array | Indexing Configuration Record | |
array | Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call! | |
array | Parameters from the log queue. | |
object | Parent object (from "crawler" extension!) |
Definition at line 458 of file class.crawler.php.
References t3lib_BEfunc::deleteClause(), and t3lib_BEfunc::getRecord().
Referenced by crawler_execute().
tx_indexedsearch_crawler::cleanUpOldRunningConfigurations | ( | ) |
Look up all old index configurations which are finished and needs to be reset and done
Definition at line 513 of file class.crawler.php.
References t3lib_BEfunc::deleteClause().
Referenced by crawler_init().
tx_indexedsearch_crawler::checkUrl | ( | $ | url, | |
$ | urlLog, | |||
$ | baseUrl | |||
) |
Check if an input URL are allowed to be indexed. Depends on whether it is already present in the url log.
string | URL string to check | |
array | Array of already indexed URLs (input url is looked up here and must not exist already) | |
string | Base URL of the indexing process (input URL must be "inside" the base URL!) |
Definition at line 579 of file class.crawler.php.
References t3lib_div::isFirstPartOfStr().
Referenced by crawler_execute_type3().
tx_indexedsearch_crawler::indexExtUrl | ( | $ | url, | |
$ | pageId, | |||
$ | rl, | |||
$ | cfgUid, | |||
$ | setId | |||
) |
Indexing External URL
string | URL, http://.... | |
integer | Page id to relate indexing to. | |
array | Rootline array to relate indexing to | |
integer | Configuration UID | |
integer | Set ID value |
Definition at line 602 of file class.crawler.php.
References t3lib_div::htmlspecialchars_decode(), loadIndexerClass(), and t3lib_div::makeInstance().
Referenced by crawler_execute_type3().
tx_indexedsearch_crawler::indexSingleRecord | ( | $ | r, | |
$ | cfgRec, | |||
$ | rl = NULL | |||
) |
Indexing Single Record
array | Record to index | |
array | Configuration Record | |
array | Rootline array to relate indexing to |
Definition at line 645 of file class.crawler.php.
References getUidRootLineForClosestTemplate(), loadIndexerClass(), t3lib_div::makeInstance(), and t3lib_div::trimExplode().
Referenced by crawler_execute_type1(), and processDatamap_afterDatabaseOperations().
tx_indexedsearch_crawler::loadIndexerClass | ( | ) |
Include indexer class.
Definition at line 694 of file class.crawler.php.
References t3lib_extMgm::extPath().
Referenced by crawler_execute_type2(), indexExtUrl(), and indexSingleRecord().
tx_indexedsearch_crawler::getUidRootLineForClosestTemplate | ( | $ | id | ) |
Get rootline for closest TypoScript template root. Algorithm same as used in Web > Template, Object browser
integer | The page id to traverse rootline back from |
Definition at line 706 of file class.crawler.php.
References t3lib_div::makeInstance().
Referenced by crawler_execute_type1(), crawler_execute_type2(), crawler_execute_type3(), and indexSingleRecord().
tx_indexedsearch_crawler::generateNextIndexingTime | ( | $ | cfgRec | ) |
Generate the unix time stamp for next visit.
array | Index configuration record |
Definition at line 739 of file class.crawler.php.
References t3lib_div::intInRange().
Referenced by crawler_init().
tx_indexedsearch_crawler::checkDeniedSuburls | ( | $ | url, | |
$ | url_deny | |||
) |
Checks if $url has any of the URls in the $url_deny "list" in it and if so, returns true.
string | URL to test | |
string | String where URLs are separated by line-breaks; If any of these strings is the first part of $url, the function returns TRUE (to indicate denial of decend) |
Definition at line 770 of file class.crawler.php.
References t3lib_div::isFirstPartOfStr(), and t3lib_div::trimExplode().
Referenced by crawler_execute_type3().
tx_indexedsearch_crawler::addQueueEntryForHook | ( | $ | cfgRec, | |
$ | title | |||
) |
Adding entry in queue for Hook
array | Configuration record | |
string | Title/URL |
Definition at line 790 of file class.crawler.php.
tx_indexedsearch_crawler::deleteFromIndex | ( | $ | id | ) |
Deletes all data stored by indexed search for a given page
integer | Uid of the page to delete all pHash |
Definition at line 806 of file class.crawler.php.
Referenced by processCmdmap_preProcess(), and processDatamap_afterDatabaseOperations().
tx_indexedsearch_crawler::processCmdmap_preProcess | ( | $ | command, | |
$ | table, | |||
$ | id, | |||
$ | value, | |||
&$ | pObj | |||
) |
TCEmain hook function for on-the-fly indexing of database records
string | TCEmain command | |
string | Table name | |
string | Record ID. If new record its a string pointing to index inside t3lib_tcemain::substNEWwithIDs | |
mixed | Target value (ignored) | |
object | Reference to tcemain calling object |
Definition at line 847 of file class.crawler.php.
References deleteFromIndex().
tx_indexedsearch_crawler::processDatamap_afterDatabaseOperations | ( | $ | status, | |
$ | table, | |||
$ | id, | |||
$ | fieldArray, | |||
&$ | pObj | |||
) |
TCEmain hook function for on-the-fly indexing of database records
string | Status "new" or "update" | |
string | Table name | |
string | Record ID. If new record its a string pointing to index inside t3lib_tcemain::substNEWwithIDs | |
array | Field array of updated fields in the operation | |
object | Reference to tcemain calling object |
Definition at line 865 of file class.crawler.php.
References t3lib_BEfunc::deleteClause(), deleteFromIndex(), t3lib_BEfunc::getRecord(), and indexSingleRecord().