Documentation TYPO3 par Ameos

tx_indexedsearch_crawler Class Reference

List of all members.

Public Member Functions

 crawler_init (&$pObj)
 crawler_execute ($params, &$pObj)
 crawler_execute_type1 ($cfgRec, &$session_data, $params, &$pObj)
 crawler_execute_type2 ($cfgRec, &$session_data, $params, &$pObj)
 crawler_execute_type3 ($cfgRec, &$session_data, $params, &$pObj)
 crawler_execute_type4 ($cfgRec, &$session_data, $params, &$pObj)
 cleanUpOldRunningConfigurations ()
 checkUrl ($url, $urlLog, $baseUrl)
 indexExtUrl ($url, $pageId, $rl, $cfgUid, $setId)
 indexSingleRecord ($r, $cfgRec, $rl=NULL)
 loadIndexerClass ()
 getUidRootLineForClosestTemplate ($id)
 generateNextIndexingTime ($cfgRec)
 checkDeniedSuburls ($url, $url_deny)
 addQueueEntryForHook ($cfgRec, $title)
 processDatamap_afterDatabaseOperations ($status, $table, $id, $fieldArray, &$pObj)

Public Attributes

 $secondsPerExternalUrl = 3
 $instanceCounter = 0
 $callBack = 'EXT:indexed_search/class.crawler.php:&tx_indexedsearch_crawler'

Detailed Description

Definition at line 87 of file class.crawler.php.


Member Function Documentation

tx_indexedsearch_crawler::addQueueEntryForHook cfgRec,
title
 

Adding entry in queue for Hook

Parameters:
array Configuration record
string Title/URL
Returns:
void

Definition at line 798 of file class.crawler.php.

tx_indexedsearch_crawler::checkDeniedSuburls url,
url_deny
 

Checks if $url has any of the URls in the $url_deny "list" in it and if so, returns true.

Parameters:
string URL to test
string String where URLs are separated by line-breaks; If any of these strings is the first part of $url, the function returns TRUE (to indicate denial of decend)
Returns:
boolean TRUE if there is a matching URL (hence, do not index!)

Definition at line 778 of file class.crawler.php.

References t3lib_div::isFirstPartOfStr(), and t3lib_div::trimExplode().

Referenced by crawler_execute_type3().

tx_indexedsearch_crawler::checkUrl url,
urlLog,
baseUrl
 

Check if an input URL are allowed to be indexed. Depends on whether it is already present in the url log.

Parameters:
string URL string to check
array Array of already indexed URLs (input url is looked up here and must not exist already)
string Base URL of the indexing process (input URL must be "inside" the base URL!)
Returns:
string Returls the URL if OK, otherwise false

Definition at line 579 of file class.crawler.php.

References t3lib_div::isFirstPartOfStr().

Referenced by crawler_execute_type3().

tx_indexedsearch_crawler::cleanUpOldRunningConfigurations  ) 
 

Look up all old index configurations which are finished and needs to be reset and done

Returns:
void

Definition at line 513 of file class.crawler.php.

References t3lib_BEfunc::deleteClause().

Referenced by crawler_init().

tx_indexedsearch_crawler::crawler_execute params,
&$  pObj
 

Call back function for execution of a log element

Parameters:
array Params from log element. Must contain $params['indexConfigUid']
object Parent object (tx_crawler lib)
Returns:
array Result array

Definition at line 219 of file class.crawler.php.

References crawler_execute_type1(), crawler_execute_type2(), crawler_execute_type3(), crawler_execute_type4(), and t3lib_div::getUserObj().

tx_indexedsearch_crawler::crawler_execute_type1 cfgRec,
&$  session_data,
params,
&$  pObj
 

Indexing records from a table

Parameters:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Returns:
void

Definition at line 285 of file class.crawler.php.

References t3lib_BEfunc::deleteClause(), getUidRootLineForClosestTemplate(), indexSingleRecord(), and t3lib_div::intInRange().

Referenced by crawler_execute().

tx_indexedsearch_crawler::crawler_execute_type2 cfgRec,
&$  session_data,
params,
&$  pObj
 

Indexing files from fileadmin

Parameters:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Returns:
void

Definition at line 345 of file class.crawler.php.

References t3lib_div::get_dirs(), t3lib_div::getAllFilesAndFoldersInPath(), t3lib_div::getFileAbsFileName(), getUidRootLineForClosestTemplate(), t3lib_div::isAbsPath(), t3lib_div::isAllowedAbsPath(), loadIndexerClass(), t3lib_div::makeInstance(), t3lib_div::removePrefixPathFromList(), and t3lib_div::trimExplode().

Referenced by crawler_execute().

tx_indexedsearch_crawler::crawler_execute_type3 cfgRec,
&$  session_data,
params,
&$  pObj
 

Indexing External URLs

Parameters:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Returns:
void

Definition at line 414 of file class.crawler.php.

References checkDeniedSuburls(), checkUrl(), getUidRootLineForClosestTemplate(), and indexExtUrl().

Referenced by crawler_execute().

tx_indexedsearch_crawler::crawler_execute_type4 cfgRec,
&$  session_data,
params,
&$  pObj
 

Page tree indexing type

Parameters:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Returns:
void

Definition at line 458 of file class.crawler.php.

References t3lib_BEfunc::deleteClause(), and t3lib_BEfunc::getRecord().

Referenced by crawler_execute().

tx_indexedsearch_crawler::crawler_init &$  pObj  ) 
 

Initialization of crawler hook. This function is asked for each instance of the crawler and we must check if something is timed to happen and if so put entry(s) in the crawlers log to start processing. In reality we select indexing configurations and evaluate if any of them needs to run.

Parameters:
object Parent object (tx_crawler lib)
Returns:
void

Definition at line 106 of file class.crawler.php.

References cleanUpOldRunningConfigurations(), t3lib_BEfunc::deleteClause(), generateNextIndexingTime(), t3lib_div::getUserObj(), and t3lib_div::md5int().

tx_indexedsearch_crawler::generateNextIndexingTime cfgRec  ) 
 

Generate the unix time stamp for next visit.

Parameters:
array Index configuration record
Returns:
integer The next time stamp

Definition at line 739 of file class.crawler.php.

References t3lib_div::intInRange().

Referenced by crawler_init().

tx_indexedsearch_crawler::getUidRootLineForClosestTemplate id  ) 
 

Get rootline for closest TypoScript template root. Algorithm same as used in Web > Template, Object browser

Parameters:
integer The page id to traverse rootline back from
Returns:
array Array where the root lines uid values are found.

Definition at line 706 of file class.crawler.php.

References t3lib_div::makeInstance().

Referenced by crawler_execute_type1(), crawler_execute_type2(), crawler_execute_type3(), and indexSingleRecord().

tx_indexedsearch_crawler::indexExtUrl url,
pageId,
rl,
cfgUid,
setId
 

Indexing External URL

Parameters:
string URL, http://....
integer Page id to relate indexing to.
array Rootline array to relate indexing to
integer Configuration UID
integer Set ID value
Returns:
array URLs found on this page

Definition at line 602 of file class.crawler.php.

References t3lib_div::htmlspecialchars_decode(), loadIndexerClass(), and t3lib_div::makeInstance().

Referenced by crawler_execute_type3().

tx_indexedsearch_crawler::indexSingleRecord r,
cfgRec,
rl = NULL
 

Indexing Single Record

Parameters:
array Record to index
array Configuration Record
array Rootline array to relate indexing to
Returns:
void

Definition at line 645 of file class.crawler.php.

References getUidRootLineForClosestTemplate(), loadIndexerClass(), t3lib_div::makeInstance(), and t3lib_div::trimExplode().

Referenced by crawler_execute_type1(), and processDatamap_afterDatabaseOperations().

tx_indexedsearch_crawler::loadIndexerClass  ) 
 

Include indexer class.

Returns:
void

Definition at line 694 of file class.crawler.php.

References t3lib_extMgm::extPath().

Referenced by crawler_execute_type2(), indexExtUrl(), and indexSingleRecord().

tx_indexedsearch_crawler::processDatamap_afterDatabaseOperations status,
table,
id,
fieldArray,
&$  pObj
 

TCEmain hook function for on-the-fly indexing of database records

Parameters:
string Status "new" or "update"
string Table name
string Record ID. If new record its a string pointing to index inside t3lib_tcemain::substNEWwithIDs
array Field array of updated fields in the operation
object Reference to tcemain calling object
Returns:
void

Definition at line 830 of file class.crawler.php.

References t3lib_BEfunc::deleteClause(), t3lib_BEfunc::getRecord(), and indexSingleRecord().


The documentation for this class was generated from the following file:


Généré par Les experts TYPO3 avec  doxygen 1.4.6