Using FULLTEXT Files/Indices for Searching


Once FULLTEXT files or indices are created, perform searches in any of these three ways:

  • Use BBx READ RECORD calls
  • Execute one of the special SQL system stored procedures (SPROCs)
  • Use the Search user interface in the 'Browser' or 'Eclipse' Enterprise Manager  

Lucene Query Language

With BBj using Lucene for its FULLTEXT file/index implementation, the full Lucene query language is available to the user for performing simple or complex searches. Searches can be as simple as one or more case-insensitive words to include in the results to something arbitrarily complex including fuzzy matching, exclusions, and more. For more information on the query language, refer to these online Lucene resources:

Search Using BBx READ RECORD

BBj programs can access FULLTEXT files using direct file access calls or SQL (discussed below). When using READ RECORD, the result is similar to accessing any other type of BBj data file, however, there are some differences.

First, instead of positioning the file pointer at a particular location in the file, the READ RECORD call tells the file channel to temporarily limit the viewable results to those records that match the search query. The order of these records will not be in a known order, but rather, in order of relevance to matching the query. The algorithm for determining relevance is part of the Lucene library and cannot be configured by BASIS or the user. Relevance is similar to how a search engine such as Google returns a list of matching results with the "best" matches at the top of the list (according to what Google thinks is a "best" match).

Another difference is in the KNUM and KEY parameters to the READ RECORD call. There are only two valid key numbers on a FULLTEXT file: 0 and 1:

  • KNUM=0: Indicates that the program would like the single record that matches the primary key as specified in the KEY= parameter.

  • KNUM=1: Indicates that the KEY= parameter contains a Lucene format query to generate a limited list of records that matches the query.

For example, assume we have a FULLTEXT file created with the template defined in the code below where the key field was CUST_NUM as shown here:

chan = unt

OPEN (chan) "/path/to/my/file/CUSTOMER.text"

DIM rec$:"CUST_NUM:C(6),FIRST_NAME:C(30),LAST_NAME:C(30)"

 

REM Read record with CUST_NUM = 000001

READ RECORD (chan, KNUM=0, KEY="000001") rec$

 

REM Read first matching record for a Lucene

REM query returning all records with a last

REM name of Smith

READ RECORD (chan, KNUM=1, KEY="smith") rec$

 

REM Read first matching record for a Lucene

REM query returning all records with a last

REM name that STARTS with Smith (note asterisk)

READ RECORD (chan, KNUM=1, KEY="smith*") rec$

Searching Using SQL System SPROCs

IMPORTANT NOTE: The Lucene search engine used by the BBj FULLTEXT index feature includes a setting used to limit the maximum number of search results by any operation using Lucene. This limit is used to reduce the resources used by the Lucene system, and defaults to 1,000. This limit overrides the limit provided in the search query. If result sets larger than 1,000 are needed, see the "Max Search Results" setting on the BBjServices: Settings page in the Enterprise Manager.

BBJ_SEARCH_TABLE

The BBJ_SEARCH_TABLE SPROC returns a result set that looks the same as the result of a SELECT * statement from the table and is suitable for use in nested SELECT statements and joining with other tables. BBJ_SEARCH_TABLE takes 3 parameters:

  • Table Name: Single table name to search

  • Query: Lucene format search query

  • Maximum Results: Maximum number of results to return where 0 indicates no limit

Example 1

The following wildcard example finds all customers who have a FIRST_NAME or LAST_NAME that starts with “Ch” in the ChileCompany demo database:

SELECT * FROM (CALL bbj_search_table('CUSTOMER', 'Ch*', 0))

While the example above is quite simple, Lucene provides a very robust query language. See Apache Lucene - Query Parser Syntax for complete information.

Example 2

This example uses more complexity. Using the ChileCompany demo database, this query returns all orders with a customer with a LAST_NAME ending with “son”. Specifying the column name in the Lucene query allows the query to examine only part of the values in the index:

SQL has some specific expected behavior when using WHERE clauses to limit search results in queries. As a result, it is necessary to handle FULLTEXT index searching in a special way apart from a simple WHERE clause. To accomplish this, BBj has two special system SPROCs available: BBJ_INDEX and BBJ_SEARCH.

BBJ_INDEX

The BBJ_INDEX SPROC returns a list of the records from the FULLTEXT file attached to a particular table, that match the specified Lucene query. For example, using the Chile Company demo database that comes with BBj 15.0 and higher (FULLTEXT index on the DESCRIPTION column in the ITEM table), the following images shows a query that returns a list of all the items in the ITEM table’s FULLTEXT index that have the word "chile" and/or a word ending with "shirt" (t-shirt, shirt, TShirt, dress-shirt, etc.):


The SQL performs a SELECT from the result of a CALL to execute the BBJ_INDEX SPROC. This is important as we will see in the next example that performs a join with the table to return all the row data matching our search. Note the three columns in the result. The DESCRIPTION column was the column used to define the FULLTEXT index and the information our search query examines to determine the results. However, the Lucene FULLTEXT file contains the synchronized table’s primary key as well as a unique record ID for the index. The unique record ID can be ignored for the most part. The result includes a column for each primary key segment with its value. The primary key value comes in very handy as shown in this example:

By joining the results of the SPROC call with the actual table, we are able to get all the columns from the table for each record that matches one of the records from the search results. Also note that the order of the results is by relevance and all the records have "chile" or "*shirt" somewhere in the description. However, the first three in the result have "chile" AND "shirt" which means they are more "relevant" than those with only "chile".

BBJ_SEARCH

The BBJ_SEARCH SPROC is similar to the BBJ_INDEX SPROC, but it allows the user to search one or more tables with FULLTEXT indices in a database at the same time (even the entire database). The results are a bit different than the BBJ_INDEX SPROC, so there are additional steps that need to occur to process the results beyond a quick overview of the matching data.

The following screen shows an example of the results a call to BBJ_SEARCH returned:

The CALL to BBJ_SEARCH does the following:

  • Parameter 1 is an empty string that means "all tables with FULLTEXT indices." Instead, the user can specify a single table, list of comma separated tables, or use wildcards that follow the SQL LIKE syntax such as 'CUS%'.
  • Parameter 2 is the Lucene query that says all records with "street" and/or "red" somewhere in one of their indexed fields.
  • Parameter 3 is the maximum number of results (0 indicates all results).
  • Parameter 4 is 1 to return a row data preview or 0 if no preview is needed (faster).

The results shows the outcome of executing a call to BBJ_SEARCH. Interpret the result columns as follows:


TABLE

The table where the match was found.

RESULT_NUM

The result number. The matches are returned grouped by table and the matches within the table grouping are in order of relevance for that table.

PK_SEGMENT

The name of the primary key segment that the PK_SEGMENT_VALUE comes from. There can potentially be multiple rows to point to a single record if the primary key has more than one segment, because each row describes one segment of the primary key and its value.

PK_SEGMENT_NUM

The primary key segment number this row describes. For single segment primary keys, there will only be one row for each matching record. For multiple segment primary keys, there will be one row for each segment of the primary key.

PK_SEGMENT_VALUE

Primary key segment value for the matching row.

ROW_DATA

If parameter 4 in the SPROC CALL is 1, then this will contain a list of all the record field values as a pipe separated name=value pair. This is designed for quick review of the data. The TABLE, PK_SEGMENT, PK_SEGMENT_NUM, and PK_SEGMENT_VALUE results should be used to locate the matching record in the original table if needed.

This SPROC call is not conducive to joins with anything else, which means that unlike the BBJ_INDEX SPROC, processing of these results must be done with application logic.

BBJ_SEARCH_RELATED

NOTE: Available only on Enhanced format databases because it relies on the presence of foreign key definitions found only in Enhanced format databases.

The BBJ_SEARCH_RELATED stored procedure is a powerful tool to search the FULLTEXT indexes on multiple related tables of data for information, all in a single call. It is important to note that it only searches the FULLTEXT indexes, not standard indexes or other column values not included in the FULLTEXT index for a table. The result set contains the results of joining the primary table with the related tables on any associated foreign keys, much in the same way an INNER JOIN works when performing a SELECT on multiple tables. However, this SPROC performs all the joining automatically without any need for specifying join criteria.

Parameters

PRIMARY_TABLE

Primary table to search. For example, in a master-detail relationship, this would be the "master" table.

RELATED_TABLES

Secondary table(s) related to the primary table. For example, in a master-detail relationship, this/these would be the "detail" table(s). Each related table should be separated by a comma.

For example: TABLE1,TABLE2,TABLE3

SELECT_ITEMS

SELECT items to be included from the primary and related table(s). A single asterisk (*) will include all columns from all tables. To include only some columns from some tables, include a list of comma-separated columns in the format TABLE.COLUMN. If there is no ambiguity in column names (i.e. column name only appears in one table), feel free to leave off the explicit table reference. This follows the same format as a typical SELECT statement’s SELECT list would follow.

QUERIES

List of pipe-separated (|) Lucene-format search queries to be executed on each table included. The first item references the primary table, and subsequent items reference the related tables. A query must be included for each referenced table. See the example below for details.

MAX_RESULTS

Maximum number of results to return or 0 for no limit.

Example 1

Perform a JOIN between the CRM_CUSTMAST and ARC_SHIPVIACODE tables on the foreign key definition. The result will include records where the CRM_CUSTMAST table has the word “Western” at the beginning of the value in one of the FULLTEXT indexed columns in the CRM_CUSTMAST table OR the word “Ground” anywhere in the value of a FULLTEXT indexed column in the ARC_SHIPVIACODE table:

CALL BBJ_SEARCH_RELATED
   ('CRM_CUSTMAST', 'ARC_SHIPVIACODE', '*', 'Western*|*Ground*', 0)

Example 2

Perform a JOIN between the CRM_CUSTMAST and ARC_SHIPVIACODE tables on the foreign key definition. The result will include records where the CRM_CUSTMAST table has the word “Western” at the beginning of the CUSTOMER_NAME column in the FULLTEXT index on the CRM_CUSTMAST table OR the word “FedEx” anywhere in the value of the DESCRIPTION column in the FULLTEXT index on the ARC_SHIPVIACODE table. The result will only contain the CUSTOMER_NAME and DESCRIPTION column values:

CALL BBJ_SEARCH_RELATED 
    ('CRM_CUSTMAST', 
     'ARC_SHIPVIACODE', 
     'CRM_CUSTMAST.CUSTOMER_NAME, ARC_SHIPVIACODE.DESCRIPTION', 
     'CUSTOMER_NAME:Western*|*DESCRIPTION:FedEx*', 0)

Enterprise Manager Search Viewer

Using the BBJ_SEARCH SPROC is more complex than the simplicity of the BBJ_INDEX SPROC. However, for those who do not wish to write application logic to process the results of the BBJ_SEARCH SPROC, the Browser and Eclipse Enterprise Managers provide this powerful Search Viewer:

Execute a search on a database from the EM

  1. Double-click on the 'Databases' node.
  2. Double-click on the database of interest.
  3. Select the ‘Search’ tab.
  4. Enter the search query, any table names, and maximum results.
  5. Execute the search by clicking the magnifier or pressing [Enter].
  6. Click on a search result to see the record details.

This simple, but powerful search viewer provides a great interface for testing Lucene searches and quickly viewing the results of searches on one or more tables at one time. The grid shows the table and quick data preview so you can quickly scroll through and see all the matches to your query. Selecting an item from the grid displays the complete record in the details section below the results grid.

See Also

Full Text Indexing and Searching

FULLTEXT File Type

FULLTEXT Verb - Create FULLTEXT File

SQL FULLTEXT Indices



______________________________________________________________________________________

Copyright BASIS International Ltd. BBj®, Visual PRO/5®, PRO/5®, and BBx® are registered trademarks.