Using FULLTEXT Files/Indices for Searching


Once FULLTEXT files or indices are created, perform searches in any of these three ways:

  • Use BBx READ RECORD calls
  • Execute one of the special SQL system stored procedures (SPROCs)
  • Use the Search user interface in the 'Browser' or 'Eclipse' Enterprise Manager  

Lucene Query Language

With BBj using Lucene for its FULLTEXT file/index implementation, the full Lucene query language is available to the user for performing simple or complex searches. Searches can be as simple as one or more case-insensitive words to include in the results to something arbitrarily complex including fuzzy matching, exclusions, and more. For more information on the query language, refer to these online Lucene resources:

Search Using BBx READ RECORD

BBj programs can access FULLTEXT files using direct file access calls or SQL (discussed below). When using READ RECORD, the result is similar to accessing any other type of BBj data file, however, there are some differences.

First, instead of positioning the file pointer at a particular location in the file, the READ RECORD call tells the file channel to temporarily limit the viewable results to those records that match the search query. The order of these records will not be in a known order, but rather, in order of relevance to matching the query. The algorithm for determining relevance is part of the Lucene library and cannot be configured by BASIS or the user. Relevance is similar to how a search engine such as Google returns a list of matching results with the "best" matches at the top of the list (according to what Google thinks is a "best" match).

Another difference is in the KNUM and KEY parameters to the READ RECORD call. There are only two valid key numbers on a FULLTEXT file: 0 and 1:

  • KNUM=0: Indicates that the program would like the single record that matches the primary key as specified in the KEY= parameter.

  • KNUM=1: Indicates that the KEY= parameter contains a Lucene format query to generate a limited list of records that matches the query.

For example, assume we have a FULLTEXT file created with the template defined in the code below where the key field was CUST_NUM as shown here:

chan = unt

OPEN (chan) "/path/to/my/file/CUSTOMER.text"

DIM rec$:"CUST_NUM:C(6),FIRST_NAME:C(30),LAST_NAME:C(30)"

 

REM Read record with CUST_NUM = 000001

READ RECORD (chan, KNUM=0, KEY="000001") rec$

 

REM Read first matching record for a Lucene

REM query returning all records with a last

REM name of Smith

READ RECORD (chan, KNUM=1, KEY="smith") rec$

 

REM Read first matching record for a Lucene

REM query returning all records with a last

REM name that STARTS with Smith (note asterisk)

READ RECORD (chan, KNUM=1, KEY="smith*") rec$

Searching Using SQL System SPROCs

IMPORTANT NOTE: The Lucene search engine used by the BBj FULLTEXT index feature includes a setting used to limit the maximum number of search results by any operation using Lucene. This limit is used to reduce the resources used by the Lucene system, and defaults to 1,000. This limit overrides the limit provided in the search query. If result sets larger than 1,000 are needed, see the "Max Search Results" setting on the BBjServices: Settings page in the Enterprise Manager.

BBJ_SEARCH_TABLE

The BBJ_SEARCH_TABLE SPROC returns a result set that looks the same as the result of a SELECT * statement from the table and is suitable for use in nested SELECT statements and joining with other tables. BBJ_SEARCH_TABLE takes 3 parameters:

  • Table Name: Single table name to search

  • Query: Lucene format search query

  • Maximum Results: Maximum number of results to return where 0 indicates no limit

Example 1

The following wildcard example finds all customers who have a FIRST_NAME or LAST_NAME that starts with “Ch” in the ChileCompany demo database:

SELECT * FROM (CALL bbj_search_table('CUSTOMER', 'Ch*', 0))

While the example above is quite simple, Lucene provides a very robust query language. See Apache Lucene - Query Parser Syntax for complete information.

Example 2

This example uses more complexity. Using the ChileCompany demo database, this query returns all orders with a customer with a LAST_NAME ending with “son”. Specifying the column name in the Lucene query allows the query to examine only part of the values in the index:

SQL has some specific expected behavior when using WHERE clauses to limit search results in queries. As a result, it is necessary to handle FULLTEXT index searching in a special way apart from a simple WHERE clause. To accomplish this, BBj has two special system SPROCs available: BBJ_INDEX and BBJ_SEARCH.

BBJ_INDEX

The BBJ_INDEX SPROC returns a list of the records from the FULLTEXT file attached to a particular table, that match the specified Lucene query. For example, using the Chile Company demo database that comes with BBj 15.0 and higher (FULLTEXT index on the DESCRIPTION column in the ITEM table), the following images shows a query that returns a list of all the items in the ITEM table’s FULLTEXT index that have the word "chile" and/or a word ending with "shirt" (t-shirt, shirt, TShirt, dress-shirt, etc.):


The SQL performs a SELECT from the result of a CALL to execute the BBJ_INDEX SPROC. This is important as we will see in the next example that performs a join with the table to return all the row data matching our search. Note the three columns in the result. The DESCRIPTION column was the column used to define the FULLTEXT index and the information our search query examines to determine the results. However, the Lucene FULLTEXT file contains the synchronized table’s primary key as well as a unique record ID for the index. The unique record ID can be ignored for the most part. The result includes a column for each primary key segment with its value. The primary key value comes in very handy as shown in this example:

By joining the results of the SPROC call with the actual table, we are able to get all the columns from the table for each record that matches one of the records from the search results. Also note that the order of the results is by relevance and all the records have "chile" or "*shirt" somewhere in the description. However, the first three in the result have "chile" AND "shirt" which means they are more "relevant" than those with only "chile".

BBJ_SEARCH

The BBJ_SEARCH SPROC is similar to the BBJ_INDEX SPROC, but it allows the user to search one or more tables with FULLTEXT indices in a database at the same time (even the entire database). The results are a bit different than the BBJ_INDEX SPROC, so there are additional steps that need to occur to process the results beyond a quick overview of the matching data.

The following screen shows an example of the results a call to BBJ_SEARCH returned:

The CALL to BBJ_SEARCH does the following:

  • Parameter 1 is an empty string that means "all tables with FULLTEXT indices." Instead, the user can specify a single table, list of comma separated tables, or use wildcards that follow the SQL LIKE syntax such as 'CUS%'.
  • Parameter 2 is the Lucene query that says all records with "street" and/or "red" somewhere in one of their indexed fields.
  • Parameter 3 is the maximum number of results (0 indicates all results).
  • Parameter 4 is 1 to return a row data preview or 0 if no preview is needed (faster).

The results shows the outcome of executing a call to BBJ_SEARCH. Interpret the result columns as follows:


TABLE

The table where the match was found.

RESULT_NUM

The result number. The matches are returned grouped by table and the matches within the table grouping are in order of relevance for that table.

PK_SEGMENT

The name of the primary key segment that the PK_SEGMENT_VALUE comes from. There can potentially be multiple rows to point to a single record if the primary key has more than one segment, because each row describes one segment of the primary key and its value.

PK_SEGMENT_NUM

The primary key segment number this row describes. For single segment primary keys, there will only be one row for each matching record. For multiple segment primary keys, there will be one row for each segment of the primary key.

PK_SEGMENT_VALUE

Primary key segment value for the matching row.

ROW_DATA

If parameter 4 in the SPROC CALL is 1, then this will contain a list of all the record field values as a pipe separated name=value pair. This is designed for quick review of the data. The TABLE, PK_SEGMENT, PK_SEGMENT_NUM, and PK_SEGMENT_VALUE results should be used to locate the matching record in the original table if needed.

This SPROC call is not conducive to joins with anything else, which means that unlike the BBJ_INDEX SPROC, processing of these results must be done with application logic.

Enterprise Manager Search Viewer

Using the BBJ_SEARCH SPROC is more complex than the simplicity of the BBJ_INDEX SPROC. However, for those who do not wish to write application logic to process the results of the BBJ_SEARCH SPROC, the Browser and Eclipse Enterprise Managers provide this powerful Search Viewer:

Execute a search on a database from the EM

  1. Double-click on the 'Databases' node.
  2. Double-click on the database of interest.
  3. Select the ‘Search’ tab.
  4. Enter the search query, any table names, and maximum results.
  5. Execute the search by clicking the magnifier or pressing [Enter].
  6. Click on a search result to see the record details.

This simple, but powerful search viewer provides a great interface for testing Lucene searches and quickly viewing the results of searches on one or more tables at one time. The grid shows the table and quick data preview so you can quickly scroll through and see all the matches to your query. Selecting an item from the grid displays the complete record in the details section below the results grid.

See Also

Full Text Indexing and Searching

FULLTEXT File Type

FULLTEXT Verb - Create FULLTEXT File

SQL FULLTEXT Indices



______________________________________________________________________________________

Copyright BASIS International Ltd. BBj®, Visual PRO/5®, PRO/5®, and BBx® are registered trademarks.