Enhancements in this release let you
- Use Data Import Handler for database indexing
- Index and search based on dynamic custom fields
- Reload individual collections
- Add languages for search
- Secure your search system using ColdFusion Administrator (Data & Services > Solr Server > Show Advanced Settings > Use HTTPs connection).
- Autocommit indexed documents
- Boost specific fields or entire document for improved search results.
Modifications to file location and filenames
- In the case of standalone installation, all Solr files reside in the jetty folder ColdFusion 10\cfusion\jetty (previously ColdFusion 9\cfusion\solr).
- On Windows, the Solr service has been renamed as ColdFusion 10 Jetty Service (previously ColdFusion 9 Solr Service, and as of ColdFusion 11, Add-on Service).
- On Windows, the executable file has been renamed as jetty.exe (previously solr.exe)
Using Data Import Handler
In ColdFusion 9, indexing database was a two step process (of querying database using the tag cfquery and indexing the query using the tag cfindex). In ColdFusion 10, you need not use cfquery to get data; rather Solr directly communicates with the database and fetches data using Data Import Handler helping you improve indexing performance.
You can perform a full or partial indexing depending on your requirement. For example, when you index the database for the first time, you may do a full indexing. For any updates in the database, you can perform partial indexing to update your collection.
Indexing using Data Import Handler
The following steps help you configure Data Import Handler for indexing databases:
- Do the following:
For full import:Create the following dataconfig.xml to define mapping of database table columns to Solr:
<dataconfig>
<datasource driver="org.hsqldb.jdbcDriver" url="jdbc:mysql:/temp/example/ex" user="user"
password="user"/>
<document name="products">
<entity name="item" query="select * from item">
<field column="ID" name="id"/>
<field column="NAME" name="name"/>
</entity>
</document>
</dataconfig>
For delta import:Create the following dataconfig.xml:
<dataconfig>
<dataSource
driver="com.mysql.jdbc.Driver" "jdbc:mysql:/temp/example/ex" user="user" password="password" />
<document name="rrr">
<entity name="item" pk="ID" query="select ID,NAME,PRICE,WEIGHT,last_modified from item"
deltaimportquery="select ID,NAME,PRICE,WEIGHT,last_modified from item where ID='${dataimporter.delta.id}'"
deltaquery="select id from item where last_modified > '${dataimporter.last index_time}'">
<field column="ID" name="uid"/>
<field column="NAME" name="name_t"/>
<field column="PRICE" name="price_f"/>
<field column="WEIGHT" name="weight_d"/>
<entity name="feature" pk="ITEM_ID"
query="select description as features from feature where item_id='${item.ID}'">
<field name="features_t" column="features"/>
</entity>
<entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'">
<entity name="category" pk="ID"
query="select description as cat from category where id = '${item_category.CATEGORY_ID}'">
<field column="cat" n a m
e="cat t"/>
</entity>
</entity>
</entity>
</dataconfig>
For details of the attributes, see Schema for the data config in the section Configuration in data-config.xml at the URL http://wiki.apache.org/solr/DataImportHandler.
- Ensure that last_modified is the column name of the table that you index and the column has time stamp.
- Unless you have this column mapped, partial import fails.
- The latest timestamp is created in the dataimport.properties available in the collection location.
- Save the file in the conf directory of the collection that you have created.
In the solrconfig.xml (in the conf directory), uncomment the following section.
<!--
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
</requestHandler>
-->This enables Data Import Handler.
- Reload the collection.
- Use one of the following cfindex actions: fullImport, deltaImport, status, or abort.
Modifications to the tag cfindex
New value for the attribute type
To use Data Import Handler, specify type=dih.
New actions
The following new actions have been added to the tag cfindex to help Solr directly fetch data from the database.
fullimport:To index full database. For instance, when you index the database for the first time.For example,
<cfindex collection="dih1" type="DIH" action="fullimport" status="st">
<cfsearch collection="dih1" criteria="damaged" name="s" orderby="price_f desc" status="stat">
deltaimport:For partial indexing. For instance, for any updates in the database, instead of a full import, you can perform delta import to update your collection.For example,
<cfindex collection="dih1" type="DIH" action="deltaimport" status="st">
<cfsearch collection="dih1" criteria="damaged" name="s" orderby="price_f desc" status="stat">
status:Provides the status of indexing, such as the total number of documents processed and status such as idle or running.For example,
<cfindex collection="bt" type="DIH" action="status" status="s">
<cfoutput>
Rows Indexed : #s.TOTAL_DOCUMENTS_PROCESSED#
<br>
</cfoutput>
<cfoutput>
Status of Solr Server : #s.status#
<br>
</cfoutput>
abort:Aborts an ongoing indexing task.
<cfindex collection="bt" type="DIH" action="abort" status="s">
<cfoutput>
Status of Solr Server : #s.status#
<br>
</cfoutput>
Storing your custom data
In addition to indexing, you can store custom information using custom fields that are dynamically defined.
For example, while indexing a PDF, you can store information such as author and date of publication as shown in the following example:
<cfindex collection="CodeColl" |
To specify custom fields, use the following syntax:
<cfindex ... |
Note: Custom fields can contain only lower case characters. |
In the code, _i stands for integer custom data whose value is stored and indexed. Any field name that ends with _i is treated as a Solr integer.
Similarly, _s stands for string custom data.
All the supported datatypes are listed in the schema.xml:
<dynamicfield name="*_i" type="sint" indexed="true" stored="true"/> |
Note: _dt supports only the date formats supported by ColdFusion. |
Example
<cfindex collection="custom1" type="file" action="update" key="#datadir#/text/text1.txt" |
New attribute orderBy in cfsearch
A new attribute orderBy has been added to cfsearch. It sorts the custom field column rank order. This is an optional attribute and by default, it sorts in ascending order.
<cfsearch |
Autocommit indexed documents
Automatically commit the changes to the search server by setting the attribute autoCommit to true in cfindex as shown in the following example:
<cfindex collection="autocommit_check" action="update" type="file" key="#Expandpath(".")#/_boost1.txt" first_t="fieldboost" second_t="secondfield" fieldboost="first_t:1,second_t:2" docboost="6" autocommit="true"> |
If false, indexed documents are not committed unless you specifically commit using cfindex action="commit". By default, the value is set to true.
Improving search result rankings
The following attributes in cfindex help you improve the search result rankings:
- fieldBoost: Boost specific fields while indexing.fieldBoost enhances the score of the fields and thereby the ranking in the search results. Multiple fields can be boosted by specifying the values as a comma-separated list.
- docBoost: Boost entire document while indexing.docBoost enhances the score of the documents and thereby the ranking in the search results.
Variations from ColdFusion 9
- ColdFusion 9 had limited support for custom fields namely custom1, custom2, custom3, and custom4. In ColdFusion 10, custom fields are dynamic.
- In ColdFusion 9, all custom fields are displayed. In ColdFusion 10, cfdump yields only fields that have data{{}}. That is, if you have specified only custom 1 and custom 2, only those two fields are displayed.
Consider the following code:
<cfsearch criteria='some_criteria and column_i: [ 10 - 20 ]'...>
Here, some_criteria indicates filtering. For example column_i: [ 10 - 20 ] means search all items whose values are between 10 and 20. column_i is the custom field provided by user while indexing.This option was available in ColdFusion 9, but limited to four custom fields. In ColdFusion 10, the options are unlimited.
In ColdFusion 10, you can sort the order in which search results have to be returned.
Note: When you search a Solr collection for field type string, the criteria should be within quotes, for example
criteria='string_s:"something missing"'
Solr Search example 1
<cfsearch collection="custom1" criteria="rank_i:[2 TO 4]" name="s1" orderby="value_i" |
Solr Search example 2: Using wild cards
<!------ Searching with wildcard *---------> |
Search limitations
Limitations: Searching custom fields of type string
Strings cannot be searched with wild cards except *. Since strings are not tokenized, you cannot search any word in a string. String can be searched as a whole and not as individual words. For example, in the case of str_s="All work and no play", you cannot search for play or work in this string. You have to perform search using full string. However, strings can be sorted in search (using orderby attribute).
Limitations: Searching custom fields of type text
Text type field is tokenized and therefore you can search for any word in the text. You can also search text using wild cards. The only limitation is that text type cannot be sorted while searching. Since text type is tokenized, Solr treats text as a set of tokens, and therefore sorting is not possible.
Limitations: Searching custom fields is case-sensitive
Custom fields can be searched only in lowercase. For example, if the name of the custom field is newDate, you must search for newdate.
Limitations: Using the attribute orderBy
The attribute orderBy must be used with untokenized fields such as stings.
Reload collection
In ColdFusion 9, to reload an individual collection you have to restart Solr, which reloads all the collections. So, whenever you modify schema.xml, for example while adding language or field type, or when you enable Data Import Handler, you have to restart Solr so that changes take effect.
In ColdFusion 10, you can limit the reload to a specific collection which helps in significant performance improvement.
To reload a collection,
- In the ColdFusion Administrator, go to Data & Services > ColdFusion Collections.
- For the specific collection, click Reload icon in Solr Collections > Actions.
Support for additional languages
ColdFusion supports search and indexing for 17 languages in addition to English. If your language is not available, you can add to the list, provided Solr supports indexing and search for that language.
For details of the supported languages, see http://wiki.apache.org/solr/LanguageAnalysis.
If Solr supports the language, you can add it as follows:
- Add filter class in the schema.xml.
Add the field type as follows:
<fieldtype name="text_th" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.th.ThaiAnalyzer"/>
</fieldtype>
....
<fieldtype name="text_hi" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.IndicNormalizationFilterFactory"/>
<filter class="solr.HindiNormalizationFilterFactory"/>
<filter class="solr.HindiStemFilterFactory"/>
</analyzer>
</fieldtype>Add the field name as follows:
<field name="contents_pt" type="text_pt" indexed="true" stored="false"
required="false"/>
<field name="contents_hi" type="text_hi" indexed="true" stored="false"
required="false"/>
- In the ColdFusion Administrator, go to Data & Services > Solr Server.
- In the section Configure Indexing languages, specify the following:
- New language: Specify the language, for example Hindi.
- New language suffix: Specify a suffix for the language, for example hi for Hindi.
Security enhancements in ColdFusion 10
Securing Solr
Since Solr cannot be done at a document level or communication level. But you can add security to your Solr search by ensuring that the application server on which Solr runs is secure. To do this,
- Secure the application server on which Solr runs; the default is jetty.
- In the ColdFusion Administrator, go to Data & Services > Solr Server.
- In Configure Solr server, click Show advanced settings.
Check Use HTTPS connection and then specify the Solr Admin HTTPS Port.
Note: Recommended to use when you use DIH.
Support for authentication
In ColdFusion 9, any user can access and add, update, and delete documents for indexing. This release provides basic authentication in jetty to secure access to collections.
Modify the web.xml of jetty server as follows:
<security-constraint>
<web-resource-collection>
<web-resource-name>
Solr authenticated application
</web-resource-name>
<url-pattern>
/core1/
</url-pattern>
{*}
</web-resource-collection>
<auth-constraint>
<role-name>
core1-role
</role-name>
</auth-constraint>
</security-constraint>
<login-config>
<auth-method>
BASIC
</auth-method>
<realm-name>
Test Realm
</realm-name>
</login-config>Uncomment the following section in jetty.xml:
<set name="UserRealms">
<array type="org.mortbay.jetty.security.UserRealm">
<item>
<new>
<set name="name">
Test Realm
</set>
<set name="config">
<systemproperty name="jetty.home" default="."/>
/etc/realm.properties
</set>
</new>
</item>
</array>
</set>- Add your username and password in /etc/example/realm.properties file as follows:username:password, core1-role
- In the ColdFusion Administrator, go to Data & Services > Solr Server > Click Show Advanced Settings in Configure Solr Server section.
Specify the username and password and then click Submit.
Note: If you do not specify the credentials, index operation occurs without authentication.