SQL FULLTEXT Indices

 

Access to this feature requires an active Software Asset Management (SAM) subscription. See Benefits of ‘Software Asset Management” Feature Line.


BBj also makes
Lucene available in the SQL engine to dramatically improve the searching capabilities on tables in a database. Note that only one FULLTEXT index is allowed per table but you may include as many columns as necessary in each index. Keep in mind that the more columns included, the slower it will be to write records to the index, and the larger the index will become. Only include those columns that would be useful in FULLTEXT searches.

Create a FULLTEXT Index

Create a FULLTEXT index using SQL or with the Browser or Eclipse Enterprise Manager table editor (this process may take a while depending on the number of records in the table).

SQL Statement

CREATE FULLTEXT INDEX ON

     customer (first_name, last_name, bill_addr1, bill_addr2)

Enterprise Manager Table Editor

The Enterprise Manager (EM) table editor provides a convenient user interface for managing FULLTEXT indices on tables like the one shown below:


Use Full Text Index

In BBj 16.0 and higher, place a check in the box to create a FULLTEXT index on the table. The index will be created when saving the table definition. Unchecking this box and saving the table definition removes the index.

Lucene Analyzer Class

In BBj 16.0 and higher, from the Lucene Javadocs, "An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text." This advanced setting should rarely be changed from the default. However, if your requirements involve the use of an alternate analyzer, specify the class implementation here. Be sure to add any required JARs to your BBj classpath under BBjServices->Java Settings->Classpath before making any changes.

Stopwords Language Code

In BBj 16.0 and higher, a stopword is a language word that has no significant meaning in a keyword based search system (e.g. Google). Lucene has a number of languages configured with a set of such words. These words are simply ignored while analyzing/tokenizing text. Prior to 17.02, English was the default language while 17.02+ uses a default of no language stopwords. If the contents of the database are in a different language, searches may be more effective if the language is set to the language of the data. Note that this setting cannot be changed later without dropping and recreating the index.

Important Notes Regarding Stopwords

Why does the default stopwords configuration use no language and thus, no stopwords? This is an extremely important decision to make when creating a FULLTEXT index. Stopwords are used to make the index more compact. This is due to the fact that when indexing text content, when a stopword is encountered, it is left out of the index. While the data is still written to the record, the stopword will not be found when searching the index so seemingly incorrect results can occur. For example, if the following sentence was present in a FULLTEXT indexed field using English as the stopwords language, there are some important things to understand:

“The quick brown fox jumped over a lazy dog.”

The entire value is stored in the record as expected. However, during indexing, the only words indexed would be quick, brown, fox, jumped, over, lazy, and dog. “The” and “a” would not be indexed because they are stopwords and generally don’t add anything meaningful when searching for matching data. So, a search for “quick” or “brown” would locate this record as a match, but searching for “the” or “a” would find no matches.

This is generally not an issue if the indexed data consists of whole language sentences. However, if the indexed data also includes product identifiers, abbreviations, or other values that may look like a stopword but in fact need to be searchable, then the best option is to use no stopwords language so that all words are indexed and thus, searchable. If in doubt, the safest option is to choose no stopwords to ensure accurate results in every case.

Create a FULLTEXT Index

  1. Log into the EM.
  2. Double-click on the Databases node.
  3. Double-click on the desired database.
  4. Click on the ‘Tables’ tab and double-click on the table on which to add the FULLTEXT index.
  5. Select the ‘FullText Index’ tab.
  6. Mark the ‘Use Full Text Index’ checkbox.
  7. Select the columns to include in the index.
  8. Save the changes using Ctrl/Cmd-S or File > Save.

Column Selection

In BBj 16.0 and higher, since only one FULLTEXT index may be present on a table, it is important to select any columns that should be available for FULLTEXT searches.  Keep in mind that the more columns present in the index, the longer write operations on the data file will take (this is is milliseconds so it may not be an issue).  Note that if the primary key column or columns are not selected, they will be added to the index automatically.  This is necessary for lookup operations during queries using the results of a FULLTEXT search operation.

Lucene Index Binding

In BBj 16.0 and higher, both the SQL and EM examples create a FULLTEXT file that is tied to the CUSTOMER table, similar to a trigger. This FULLTEXT file consists of a <filename>.sync file which ties the file to the Lucene index, and a Lucene index which is a directory named <filename>.textsearch. The creation process creates these files in the same directory as the table’s data file. When changes are made to the table data file, those changes are also made in the Lucene index so they stay in sync. Note that changes made using WRITE/WRITE RECORD/REMOVE also update the FULLTEXT index appropriately. This way the FULLTEXT index remains in sync regardless of the method used to update the data file.

See Also

Document Indices

Full Text Indexing and Searching

FULLTEXT File Type

FULLTEXT Verb - Create FULLTEXT File

Using FULLTEXT Files/Indices for Searching

 



______________________________________________________________________________________

Copyright BASIS International Ltd. BBj®, Visual PRO/5®, PRO/5®, and BBx® are registered trademarks.