Documentation TYPO3 par Ameos

tx_indexedsearch_crawler Class Reference

List of all members.

Public Member Functions

 crawler_init (&$pObj)
 crawler_execute ($params, &$pObj)
 crawler_execute_type1 ($cfgRec, &$session_data, $params, &$pObj)
 crawler_execute_type2 ($cfgRec, &$session_data, $params, &$pObj)
 crawler_execute_type3 ($cfgRec, &$session_data, $params, &$pObj)
 crawler_execute_type4 ($cfgRec, &$session_data, $params, &$pObj)
 cleanUpOldRunningConfigurations ()
 checkUrl ($url, $urlLog, $baseUrl)
 indexExtUrl ($url, $pageId, $rl, $cfgUid, $setId)
 indexSingleRecord ($r, $cfgRec, $rl=NULL)
 loadIndexerClass ()
 getUidRootLineForClosestTemplate ($id)
 generateNextIndexingTime ($cfgRec)
 checkDeniedSuburls ($url, $url_deny)
 addQueueEntryForHook ($cfgRec, $title)
 deleteFromIndex ($id)
 processCmdmap_preProcess ($command, $table, $id, $value, &$pObj)
 processDatamap_afterDatabaseOperations ($status, $table, $id, $fieldArray, &$pObj)

Public Attributes

 $secondsPerExternalUrl = 3
 $instanceCounter = 0
 $callBack = 'EXT:indexed_search/class.crawler.php:&tx_indexedsearch_crawler'

Detailed Description

Definition at line 87 of file class.crawler.php.


Member Function Documentation

tx_indexedsearch_crawler::crawler_init ( &$  pObj  ) 

Initialization of crawler hook. This function is asked for each instance of the crawler and we must check if something is timed to happen and if so put entry(s) in the crawlers log to start processing. In reality we select indexing configurations and evaluate if any of them needs to run.

Parameters:
object Parent object (tx_crawler lib)
Returns:
void

Definition at line 106 of file class.crawler.php.

References cleanUpOldRunningConfigurations(), t3lib_BEfunc::deleteClause(), generateNextIndexingTime(), t3lib_div::getUserObj(), and t3lib_div::md5int().

tx_indexedsearch_crawler::crawler_execute ( params,
&$  pObj 
)

Call back function for execution of a log element

Parameters:
array Params from log element. Must contain $params['indexConfigUid']
object Parent object (tx_crawler lib)
Returns:
array Result array

Definition at line 219 of file class.crawler.php.

References crawler_execute_type1(), crawler_execute_type2(), crawler_execute_type3(), crawler_execute_type4(), and t3lib_div::getUserObj().

tx_indexedsearch_crawler::crawler_execute_type1 ( cfgRec,
&$  session_data,
params,
&$  pObj 
)

Indexing records from a table

Parameters:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Returns:
void

Definition at line 285 of file class.crawler.php.

References t3lib_BEfunc::deleteClause(), getUidRootLineForClosestTemplate(), indexSingleRecord(), and t3lib_div::intInRange().

Referenced by crawler_execute().

tx_indexedsearch_crawler::crawler_execute_type2 ( cfgRec,
&$  session_data,
params,
&$  pObj 
)

Indexing files from fileadmin

Parameters:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Returns:
void

Definition at line 345 of file class.crawler.php.

References t3lib_div::get_dirs(), t3lib_div::getAllFilesAndFoldersInPath(), t3lib_div::getFileAbsFileName(), getUidRootLineForClosestTemplate(), t3lib_div::isAbsPath(), t3lib_div::isAllowedAbsPath(), loadIndexerClass(), t3lib_div::makeInstance(), t3lib_div::removePrefixPathFromList(), and t3lib_div::trimExplode().

Referenced by crawler_execute().

tx_indexedsearch_crawler::crawler_execute_type3 ( cfgRec,
&$  session_data,
params,
&$  pObj 
)

Indexing External URLs

Parameters:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Returns:
void

Definition at line 414 of file class.crawler.php.

References checkDeniedSuburls(), checkUrl(), getUidRootLineForClosestTemplate(), and indexExtUrl().

Referenced by crawler_execute().

tx_indexedsearch_crawler::crawler_execute_type4 ( cfgRec,
&$  session_data,
params,
&$  pObj 
)

Page tree indexing type

Parameters:
array Indexing Configuration Record
array Session data for the indexing session spread over multiple instances of the script. Passed by reference so changes hereto will be saved for the next call!
array Parameters from the log queue.
object Parent object (from "crawler" extension!)
Returns:
void

Definition at line 458 of file class.crawler.php.

References t3lib_BEfunc::deleteClause(), and t3lib_BEfunc::getRecord().

Referenced by crawler_execute().

tx_indexedsearch_crawler::cleanUpOldRunningConfigurations (  ) 

Look up all old index configurations which are finished and needs to be reset and done

Returns:
void

Definition at line 513 of file class.crawler.php.

References t3lib_BEfunc::deleteClause().

Referenced by crawler_init().

tx_indexedsearch_crawler::checkUrl ( url,
urlLog,
baseUrl 
)

Check if an input URL are allowed to be indexed. Depends on whether it is already present in the url log.

Parameters:
string URL string to check
array Array of already indexed URLs (input url is looked up here and must not exist already)
string Base URL of the indexing process (input URL must be "inside" the base URL!)
Returns:
string Returls the URL if OK, otherwise false

Definition at line 579 of file class.crawler.php.

References t3lib_div::isFirstPartOfStr().

Referenced by crawler_execute_type3().

tx_indexedsearch_crawler::indexExtUrl ( url,
pageId,
rl,
cfgUid,
setId 
)

Indexing External URL

Parameters:
string URL, http://....
integer Page id to relate indexing to.
array Rootline array to relate indexing to
integer Configuration UID
integer Set ID value
Returns:
array URLs found on this page

Definition at line 602 of file class.crawler.php.

References t3lib_div::htmlspecialchars_decode(), loadIndexerClass(), and t3lib_div::makeInstance().

Referenced by crawler_execute_type3().

tx_indexedsearch_crawler::indexSingleRecord ( r,
cfgRec,
rl = NULL 
)

Indexing Single Record

Parameters:
array Record to index
array Configuration Record
array Rootline array to relate indexing to
Returns:
void

Definition at line 645 of file class.crawler.php.

References getUidRootLineForClosestTemplate(), loadIndexerClass(), t3lib_div::makeInstance(), and t3lib_div::trimExplode().

Referenced by crawler_execute_type1(), and processDatamap_afterDatabaseOperations().

tx_indexedsearch_crawler::loadIndexerClass (  ) 

Include indexer class.

Returns:
void

Definition at line 694 of file class.crawler.php.

References t3lib_extMgm::extPath().

Referenced by crawler_execute_type2(), indexExtUrl(), and indexSingleRecord().

tx_indexedsearch_crawler::getUidRootLineForClosestTemplate ( id  ) 

Get rootline for closest TypoScript template root. Algorithm same as used in Web > Template, Object browser

Parameters:
integer The page id to traverse rootline back from
Returns:
array Array where the root lines uid values are found.

Definition at line 706 of file class.crawler.php.

References t3lib_div::makeInstance().

Referenced by crawler_execute_type1(), crawler_execute_type2(), crawler_execute_type3(), and indexSingleRecord().

tx_indexedsearch_crawler::generateNextIndexingTime ( cfgRec  ) 

Generate the unix time stamp for next visit.

Parameters:
array Index configuration record
Returns:
integer The next time stamp

Definition at line 739 of file class.crawler.php.

References t3lib_div::intInRange().

Referenced by crawler_init().

tx_indexedsearch_crawler::checkDeniedSuburls ( url,
url_deny 
)

Checks if $url has any of the URls in the $url_deny "list" in it and if so, returns true.

Parameters:
string URL to test
string String where URLs are separated by line-breaks; If any of these strings is the first part of $url, the function returns TRUE (to indicate denial of decend)
Returns:
boolean TRUE if there is a matching URL (hence, do not index!)

Definition at line 770 of file class.crawler.php.

References t3lib_div::isFirstPartOfStr(), and t3lib_div::trimExplode().

Referenced by crawler_execute_type3().

tx_indexedsearch_crawler::addQueueEntryForHook ( cfgRec,
title 
)

Adding entry in queue for Hook

Parameters:
array Configuration record
string Title/URL
Returns:
void

Definition at line 790 of file class.crawler.php.

tx_indexedsearch_crawler::deleteFromIndex ( id  ) 

Deletes all data stored by indexed search for a given page

Parameters:
integer Uid of the page to delete all pHash
Returns:
void

Definition at line 806 of file class.crawler.php.

Referenced by processCmdmap_preProcess(), and processDatamap_afterDatabaseOperations().

tx_indexedsearch_crawler::processCmdmap_preProcess ( command,
table,
id,
value,
&$  pObj 
)

TCEmain hook function for on-the-fly indexing of database records

Parameters:
string TCEmain command
string Table name
string Record ID. If new record its a string pointing to index inside t3lib_tcemain::substNEWwithIDs
mixed Target value (ignored)
object Reference to tcemain calling object
Returns:
void

Definition at line 847 of file class.crawler.php.

References deleteFromIndex().

tx_indexedsearch_crawler::processDatamap_afterDatabaseOperations ( status,
table,
id,
fieldArray,
&$  pObj 
)

TCEmain hook function for on-the-fly indexing of database records

Parameters:
string Status "new" or "update"
string Table name
string Record ID. If new record its a string pointing to index inside t3lib_tcemain::substNEWwithIDs
array Field array of updated fields in the operation
object Reference to tcemain calling object
Returns:
void

Definition at line 865 of file class.crawler.php.

References t3lib_BEfunc::deleteClause(), deleteFromIndex(), t3lib_BEfunc::getRecord(), and indexSingleRecord().


The documentation for this class was generated from the following file:


Généré par Les experts TYPO3 avec  doxygen 1.4.6