Zend Framework  1.12
Public Member Functions | Static Public Member Functions | Public Attributes | List of all members
Zend_Search_Lucene Class Reference

Public Member Functions

 getGeneration ()
 Get generation number associated with this index instance.
 
 getFormatVersion ()
 Get index format version.
 
 setFormatVersion ($formatVersion)
 Set index format version.
 
 __construct ($directory=null, $create=false)
 Opens the index.
 
 addReference ()
 Add reference to the index object.
 
 removeReference ()
 Remove reference from the index object.
 
 __destruct ()
 Object destructor.
 
 getDirectory ()
 Returns the Zend_Search_Lucene_Storage_Directory instance for this index.
 
 count ()
 Returns the total number of documents in this index (including deleted documents).
 
 maxDoc ()
 Returns one greater than the largest possible document number.
 
 numDocs ()
 Returns the total number of non-deleted documents in this index.
 
 isDeleted ($id)
 Checks, that document is deleted.
 
 getMaxBufferedDocs ()
 Retrieve index maxBufferedDocs option.
 
 setMaxBufferedDocs ($maxBufferedDocs)
 Set index maxBufferedDocs option.
 
 getMaxMergeDocs ()
 Retrieve index maxMergeDocs option.
 
 setMaxMergeDocs ($maxMergeDocs)
 Set index maxMergeDocs option.
 
 getMergeFactor ()
 Retrieve index mergeFactor option.
 
 setMergeFactor ($mergeFactor)
 Set index mergeFactor option.
 
 find ($query)
 Performs a query against the index and returns an array of Zend_Search_Lucene_Search_QueryHit objects.
 
 getFieldNames ($indexed=false)
 Returns a list of all unique field names that exist in this index.
 
 getDocument ($id)
 Returns a Zend_Search_Lucene_Document object for the document number $id in this index.
 
 hasTerm (Zend_Search_Lucene_Index_Term $term)
 Returns true if index contain documents with specified term.
 
 termDocs (Zend_Search_Lucene_Index_Term $term, $docsFilter=null)
 Returns IDs of all documents containing term.
 
 termDocsFilter (Zend_Search_Lucene_Index_Term $term, $docsFilter=null)
 Returns documents filter for all documents containing term.
 
 termFreqs (Zend_Search_Lucene_Index_Term $term, $docsFilter=null)
 Returns an array of all term freqs.
 
 termPositions (Zend_Search_Lucene_Index_Term $term, $docsFilter=null)
 Returns an array of all term positions in the documents.
 
 docFreq (Zend_Search_Lucene_Index_Term $term)
 Returns the number of documents in this index containing the $term.
 
 getSimilarity ()
 Retrive similarity used by index reader.
 
 norm ($id, $fieldName)
 Returns a normalization factor for "field, document" pair.
 
 hasDeletions ()
 Returns true if any documents have been deleted from this index.
 
 delete ($id)
 Deletes a document from the index.
 
 addDocument (Zend_Search_Lucene_Document $document)
 Adds a document to this index.
 
 commit ()
 Commit changes resulting from delete() or undeleteAll() operations.
 
 optimize ()
 Optimize index.
 
 terms ()
 Returns an array of all terms in this index.
 
 resetTermsStream ()
 Reset terms stream.
 
 skipTo (Zend_Search_Lucene_Index_Term $prefix)
 Skip terms stream up to the specified term preffix.
 
 nextTerm ()
 Scans terms dictionary and returns next term.
 
 currentTerm ()
 Returns term in current position.
 
 closeTermsStream ()
 Close terms stream.
 
 undeleteAll ()
 Undeletes all documents currently marked as deleted in this index.
 

Static Public Member Functions

static create ($directory)
 Create index.
 
static open ($directory)
 Open index.
 
static getActualGeneration (Zend_Search_Lucene_Storage_Directory $directory)
 Get current generation number.
 
static getSegmentFileName ($generation)
 Get segments file name.
 
static setDefaultSearchField ($fieldName)
 Set default search field.
 
static getDefaultSearchField ()
 Get default search field.
 
static setResultSetLimit ($limit)
 Set result set limit.
 
static getResultSetLimit ()
 Get result set limit.
 
static setTermsPerQueryLimit ($limit)
 Set terms per query limit.
 
static getTermsPerQueryLimit ()
 Get result set limit.
 

Public Attributes

const FORMAT_PRE_2_1 = 0
 
const FORMAT_2_1 = 1
 
const FORMAT_2_3 = 2
 
const GENERATION_RETRIEVE_COUNT = 10
 Generation retrieving counter.
 
const GENERATION_RETRIEVE_PAUSE = 50
 Pause between generation retrieving attempts in milliseconds.
 

Constructor & Destructor Documentation

__construct (   $directory = null,
  $create = false 
)

Opens the index.

IndexReader constructor needs Directory as a parameter. It should be a string with a path to the index folder or a Directory object.

Parameters
Zend_Search_Lucene_Storage_Directory_Filesystem | string$directory
Exceptions
Zend_Search_Lucene_Exception
__destruct ( )

Object destructor.

Member Function Documentation

addDocument ( Zend_Search_Lucene_Document  $document)

Adds a document to this index.

Parameters
Zend_Search_Lucene_Document$document

Implements Zend_Search_Lucene_Interface.

addReference ( )

Add reference to the index object.

Implements Zend_Search_Lucene_Interface.

closeTermsStream ( )

Close terms stream.

Should be used for resources clean up if stream is not read up to the end

Implements Zend_Search_Lucene_Index_TermsStream_Interface.

commit ( )

Commit changes resulting from delete() or undeleteAll() operations.

Todo:
undeleteAll processing.

Implements Zend_Search_Lucene_Interface.

count ( )

Returns the total number of documents in this index (including deleted documents).

Returns
integer

Implements Zend_Search_Lucene_Interface.

static create (   $directory)
static

Create index.

Parameters
mixed$directory
Returns
Zend_Search_Lucene_Interface

Zend_Search_Lucene_Proxy

currentTerm ( )

Returns term in current position.

Returns
Zend_Search_Lucene_Index_Term|null

Implements Zend_Search_Lucene_Index_TermsStream_Interface.

delete (   $id)

Deletes a document from the index.

$id is an internal document id

Parameters
integer | Zend_Search_Lucene_Search_QueryHit$id
Exceptions
Zend_Search_Lucene_Exception

Implements Zend_Search_Lucene_Interface.

docFreq ( Zend_Search_Lucene_Index_Term  $term)

Returns the number of documents in this index containing the $term.

Parameters
Zend_Search_Lucene_Index_Term$term
Returns
integer

Implements Zend_Search_Lucene_Interface.

find (   $query)

Performs a query against the index and returns an array of Zend_Search_Lucene_Search_QueryHit objects.

Input is a string or Zend_Search_Lucene_Search_Query.

Parameters
Zend_Search_Lucene_Search_QueryParser | string$query
Returns
array Zend_Search_Lucene_Search_QueryHit
Exceptions
Zend_Search_Lucene_Exception

Zend_Search_Lucene_Search_QueryHit

Implements Zend_Search_Lucene_Interface.

static getActualGeneration ( Zend_Search_Lucene_Storage_Directory  $directory)
static

Get current generation number.

Returns generation number 0 means pre-2.1 index format -1 means there are no segments files.

Parameters
Zend_Search_Lucene_Storage_Directory$directory
Returns
integer
Exceptions
Zend_Search_Lucene_Exception

Zend_Search_Lucene uses segments.gen file to retrieve current generation number

Apache Lucene index format documentation mentions this method only as a fallback method

Nevertheless we use it according to the performance considerations

Todo:
check if we can use some modification of Apache Lucene generation determination algorithm without performance problems

Implements Zend_Search_Lucene_Interface.

static getDefaultSearchField ( )
static

Get default search field.

Null means, that search is performed through all fields by default

Returns
string

Implements Zend_Search_Lucene_Interface.

getDirectory ( )
getDocument (   $id)

Returns a Zend_Search_Lucene_Document object for the document number $id in this index.

Parameters
integer | Zend_Search_Lucene_Search_QueryHit$id
Returns
Zend_Search_Lucene_Document
Exceptions
Zend_Search_Lucene_ExceptionException is thrown if $id is out of the range

Implements Zend_Search_Lucene_Interface.

getFieldNames (   $indexed = false)

Returns a list of all unique field names that exist in this index.

Parameters
boolean$indexed
Returns
array

Implements Zend_Search_Lucene_Interface.

getFormatVersion ( )

Get index format version.

Returns
integer

Implements Zend_Search_Lucene_Interface.

getGeneration ( )

Get generation number associated with this index instance.

The same generation number in pair with document number or query string guarantees to give the same result while index retrieving. So it may be used for search result caching.

Returns
integer
getMaxBufferedDocs ( )

Retrieve index maxBufferedDocs option.

maxBufferedDocs is a minimal number of documents required before the buffered in-memory documents are written into a new Segment

Default value is 10

Returns
integer

Implements Zend_Search_Lucene_Interface.

getMaxMergeDocs ( )

Retrieve index maxMergeDocs option.

maxMergeDocs is a largest number of documents ever merged by addDocument(). Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

Default value is PHP_INT_MAX

Returns
integer

Implements Zend_Search_Lucene_Interface.

getMergeFactor ( )

Retrieve index mergeFactor option.

mergeFactor determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

Default value is 10

Returns
integer

Implements Zend_Search_Lucene_Interface.

static getResultSetLimit ( )
static

Get result set limit.

0 means no limit

Returns
integer

Implements Zend_Search_Lucene_Interface.

static getSegmentFileName (   $generation)
static

Get segments file name.

Parameters
integer$generation
Returns
string

Implements Zend_Search_Lucene_Interface.

getSimilarity ( )
static getTermsPerQueryLimit ( )
static

Get result set limit.

0 (default) means no limit

Returns
integer
hasDeletions ( )

Returns true if any documents have been deleted from this index.

Returns
boolean

Implements Zend_Search_Lucene_Interface.

hasTerm ( Zend_Search_Lucene_Index_Term  $term)

Returns true if index contain documents with specified term.

Is used for query optimization.

Parameters
Zend_Search_Lucene_Index_Term$term
Returns
boolean

Implements Zend_Search_Lucene_Interface.

isDeleted (   $id)

Checks, that document is deleted.

Parameters
integer$id
Returns
boolean
Exceptions
Zend_Search_Lucene_ExceptionException is thrown if $id is out of the range

Implements Zend_Search_Lucene_Interface.

maxDoc ( )

Returns one greater than the largest possible document number.

This may be used to, e.g., determine how big to allocate a structure which will have an element for every document number in an index.

Returns
integer

Implements Zend_Search_Lucene_Interface.

nextTerm ( )

Scans terms dictionary and returns next term.

Returns
Zend_Search_Lucene_Index_Term|null

Implements Zend_Search_Lucene_Index_TermsStream_Interface.

norm (   $id,
  $fieldName 
)

Returns a normalization factor for "field, document" pair.

Parameters
integer$id
string$fieldName
Returns
float

Implements Zend_Search_Lucene_Interface.

numDocs ( )

Returns the total number of non-deleted documents in this index.

Returns
integer

Implements Zend_Search_Lucene_Interface.

static open (   $directory)
static

Open index.

Parameters
mixed$directory
Returns
Zend_Search_Lucene_Interface

Zend_Search_Lucene_Proxy

optimize ( )

Optimize index.

Merges all segments into one

Implements Zend_Search_Lucene_Interface.

removeReference ( )

Remove reference from the index object.

When reference count becomes zero, index is closed and resources are cleaned up

Implements Zend_Search_Lucene_Interface.

resetTermsStream ( )
static setDefaultSearchField (   $fieldName)
static

Set default search field.

Null means, that search is performed through all fields by default

Default value is null

Parameters
string$fieldName

Implements Zend_Search_Lucene_Interface.

setFormatVersion (   $formatVersion)

Set index format version.

Index is converted to this format at the nearest upfdate time

Parameters
int$formatVersion
Exceptions
Zend_Search_Lucene_Exception

Implements Zend_Search_Lucene_Interface.

setMaxBufferedDocs (   $maxBufferedDocs)

Set index maxBufferedDocs option.

maxBufferedDocs is a minimal number of documents required before the buffered in-memory documents are written into a new Segment

Default value is 10

Parameters
integer$maxBufferedDocs

Implements Zend_Search_Lucene_Interface.

setMaxMergeDocs (   $maxMergeDocs)

Set index maxMergeDocs option.

maxMergeDocs is a largest number of documents ever merged by addDocument(). Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.

Default value is PHP_INT_MAX

Parameters
integer$maxMergeDocs

Implements Zend_Search_Lucene_Interface.

setMergeFactor (   $mergeFactor)

Set index mergeFactor option.

mergeFactor determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches on unoptimized indices are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indices are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

Default value is 10

Parameters
integer$maxMergeDocs

Implements Zend_Search_Lucene_Interface.

static setResultSetLimit (   $limit)
static

Set result set limit.

0 (default) means no limit

Parameters
integer$limit

Implements Zend_Search_Lucene_Interface.

static setTermsPerQueryLimit (   $limit)
static

Set terms per query limit.

0 means no limit

Parameters
integer$limit
skipTo ( Zend_Search_Lucene_Index_Term  $prefix)

Skip terms stream up to the specified term preffix.

Prefix contains fully specified field info and portion of searched term

Parameters
Zend_Search_Lucene_Index_Term$prefix

Implements Zend_Search_Lucene_Index_TermsStream_Interface.

termDocs ( Zend_Search_Lucene_Index_Term  $term,
  $docsFilter = null 
)

Returns IDs of all documents containing term.

Parameters
Zend_Search_Lucene_Index_Term$term
Zend_Search_Lucene_Index_DocsFilter | null$docsFilter
Returns
array

Implements Zend_Search_Lucene_Interface.

termDocsFilter ( Zend_Search_Lucene_Index_Term  $term,
  $docsFilter = null 
)

Returns documents filter for all documents containing term.

It performs the same operation as termDocs, but return result as Zend_Search_Lucene_Index_DocsFilter object

Parameters
Zend_Search_Lucene_Index_Term$term
Zend_Search_Lucene_Index_DocsFilter | null$docsFilter
Returns
Zend_Search_Lucene_Index_DocsFilter

Implements Zend_Search_Lucene_Interface.

termFreqs ( Zend_Search_Lucene_Index_Term  $term,
  $docsFilter = null 
)

Returns an array of all term freqs.

Result array structure: array(docId => freq, ...)

Parameters
Zend_Search_Lucene_Index_Term$term
Zend_Search_Lucene_Index_DocsFilter | null$docsFilter
Returns
integer

Implements Zend_Search_Lucene_Interface.

termPositions ( Zend_Search_Lucene_Index_Term  $term,
  $docsFilter = null 
)

Returns an array of all term positions in the documents.

Result array structure: array(docId => array(pos1, pos2, ...), ...)

Parameters
Zend_Search_Lucene_Index_Term$term
Zend_Search_Lucene_Index_DocsFilter | null$docsFilter
Returns
array

Implements Zend_Search_Lucene_Interface.

terms ( )

Returns an array of all terms in this index.

Returns
array

Zend_Search_Lucene_Index_TermsPriorityQueue

Implements Zend_Search_Lucene_Interface.

undeleteAll ( )

Undeletes all documents currently marked as deleted in this index.

Todo:
Implementation

Implements Zend_Search_Lucene_Interface.

Member Data Documentation

const FORMAT_2_1 = 1
const FORMAT_2_3 = 2
const FORMAT_PRE_2_1 = 0
const GENERATION_RETRIEVE_COUNT = 10

Generation retrieving counter.

const GENERATION_RETRIEVE_PAUSE = 50

Pause between generation retrieving attempts in milliseconds.