BBj Metrics

Access to this feature requires an active Software Asset Management (SAM) subscription. See Benefits of ‘Software Asset Management” Feature Line.

BBjServices is a powerful server consisting of numerous services, each providing different server-based functionality including BBx interpreters, administration, SQL engine, servlets, and more. With all this varying user interaction, it may be helpful for administrators and managers to obtain deep insight into the activity occurring inside BBjServices. BBj 18.00 optionally provides a powerful feature exposing a variety of metrics.

How Metrics Work

BBj uses the open-source Prometheus Client Library (see http://prometheus.io for more information) to instrument (collect) and export the collected metrics for analysis. Use of this format provides the ability to use a combination of the open-source systems monitoring and alerting package, Prometheus, and an open-source visualization tool such as Grafana (see http://grafana.com for more information) to generate beautiful, browser-based, interactive, and highly-customizable visualizations of BBjServices metrics.

The following image shows an example of Prometheus and a Grafana Dashboard utilizing BBj metrics. The graph displays the number of active SQL connections over a 5 minute period of time. Grafana provides complete customization to change the segment of time to examine, colors, data to include on the graph, and much more.

BBj most often utilizes two types of metrics:

  • Gauges - Values that increase and decrease over time (see the image above).

  • Counters - Values that increase over time.

NOTE: All metrics are cleared out when BBjServices is restarted.

Prometheus and BBj

While BBjServices provides a mechanism for generating the metrics, it does not persist this information. In order to use the metrics for analysis, some mechanism needs to be in place for persisting the information. Prometheus is an open-source systems monitoring and alerting package (installed separately from BBj) that is a special type of time-series database designed specifically for this purpose.

Prometheus jobs are created to hit an HTTP/HTTPS URL at a configured interval. It parses the information and stores it in its database with timestamp information so it knows when those metrics were acquired. From the Prometheus documentation, Prometheus's main features are:

  • A multi-dimensional data model with time series data identified by metric name and key/value pairs

  • A flexible query language to leverage this dimensionality

  • No reliance on distributed storage; single server nodes are autonomous

  • Time series collection happens via a pull model over HTTP

  • Pushing time series is supported via an intermediary gateway

  • Targets are discovered via service discovery or static configuration

  • Multiple modes of graphing and dashboarding support.

In short, BBj generates the metrics while Prometheus stores that data making it available for processing and analysis.

Typical Setup

  • BBjServices is installed, running BBj programs, serving SQL queries, etc. all the while gathering various metrics.

  • Prometheus is installed and running on another machine and configured to hit the metrics endpoint on the BBj Jetty server every 15 seconds to acquire updated metrics.

  • Prometheus stores these metrics with updated timestamp information.

  • Grafana (referenced earlier) is configured with numerous dashboards displaying graphs for analysis that are populated by the data stored in the Prometheus database.

Enabling/Disabling Metrics

By default, all metrics are disabled. Upon installation of a version of BBjServices that supports metrics (18.00+), metrics are easily enabled using the Enterprise Manager “Metrics” configuration page opened via the EM navigator at MyServer->BBjServices->Metrics:

The list of available metrics corresponds to those available on the server. A check indicates that the metric will be collected and exported via the “Metrics Jetty Endpoint” defined below the list of metrics. Make sure to save any changes made to the enabled states of the metrics.

While the metric names are reasonably self-explanatory, hovering over the small “i” icon to the left of the metric name displays a more detailed description of each metric.

Jetty Endpoint Context

BBj exports the metrics via the built-in Jetty web server and at an endpoint defined in this field. The value specified is the base context for the endpoint. Note in the image above that a context of “prometheus” is used and that the servlet name of “metrics” is appended to that to build the full path. While the context is configurable, the servlet name is not.

NOTE: The context specified here does not correspond to the contexts configured elsewhere in the Enterprise Manager.

Endpoint Security

Most of the metrics provided by BBjServices should be relatively useless with regard to security (number of connections, process counts, etc.). However, some metrics include usernames (never passwords) which could be considered a security risk. To protect this data, BBj 18.10 introduced mandatory security for the metrics endpoint. The security uses “basic” type HTTP authentication where the credentials must be a valid BBj username/password.

By default, only the “admin” user has access to the metrics endpoint. However, to grant permission to an additional user(s), simply add the “View Metrics Results” permission to the user or group profile in the EM at Security->Users or Security->Groups respectively.

Available Metrics

The following is a list of the available metrics. It’s important to keep in mind that the more metrics enabled, the more system resources will be used for calculating and storing those metrics. While this is generally minimal, over time it could accumulate and have a performance impact.

Most tools such as Prometheus and Grafana, use content assist to provide the available metrics as well as a description of the metric, as an administrator is building a dashboard or query.

BBj Processes

The core of BBjServices is the application server running BBj applications. To import dashboards related to monitoring BBj Processes, see the section “Importing Existing Dashboards” below. Use dashboard ID: 5417 to import the BBj Interpreters and SQL Activity dashboard.

bbj_running_processes

The bbj_running_processes is a gauge increased and decreased as BBj interpreters are started and terminated. When the metrics endpoint is queried, the current number of running processes is returned. This information is useful to monitor interpreter load at a variety of points in time.

Example 1

Display the number of processes running over the course of the business day:

bbj_running_processes[5m]

Example 2

Another useful scenario would be monitoring the change in the number of running processes over the course of time:

increase(bbj_active_sql_connections[5m])

BBj Services Information

The BBjServices System metrics consist of information that does not continuously update such as the start time, build date, version, etc. This category of metrics is always enabled since it is very minimal in its impact on resources.

bbj_info

Returns various information related to the BBjServices instance. This is a gauge with several labels including:

  • release - The BBj version (i.e. “REV 18.00").

  • build_date - The date the build of the product occurred.

  • build_epoch - The build epoch for the BBjServices build.

  • build_note - String related to this particular instance of BBjServices.

  • runtime - The Java runtime environment running BBjServices.

  • host - Hostname for the machine running BBjServices.

  • up_since - Date and time BBjServices was started.

  • service - True/false whether BBjServices is running as a service.

  • user - The OS level user running the BBjServices process.

  • timezone - Timezone name for the machine running BBjServices.

bbj_start_seconds

Gauge indicating the BBjServices start time in seconds since the epoch (January 1, 1970).

JVM Details

BBjServices includes the option of exporting various JVM metrics. BBj uses the standard JMX client metrics from the Prometheus client library. These metrics begin with the prefix jvm_. A detailed description of each is beyond the scope of this documentation. However, for those familiar with JVM’s, most are self-explanatory by the name of the metric and/or the description.

Grafana provides a convenient dashboard, preconfigured to display useful information. The dashboard is easy to install using the import feature. To import the following JVM dashboard, see the section “Importing Existing Dashboards” below. Use dashboard ID: 3066 to import the JVM overview - Prometheus dashboard.

The following shows an example of the kinds of graphs in this dashboard:

Replication

Replication includes a number of metrics for monitoring the state of replication jobs. Typically, all metrics would not fit well on a single dashboard panel, nor are most metrics necessary or useful in all circumstances. The table below provides a brief description of each metric to help the administrator determine the information necessary for their particular monitoring needs:

 

bbj_replication_bad_file_count

The number of files currently in a problematic state. Use the Enterprise Manager or Admin API for more details.

bbj_replication_copy_aborted

The number of files the job failed to copy from the source to the target.

bbj_replication_copy_running

The number of files the job is currently copying.

bbj_replication_copy_waiting

The number of files currently waiting to be copied (in the queue).

bbj_replication_disabled

0 or 1 indicating whether the job was disabled at the specific point in time.

bbj_replication_has_error

0 or 1 if the job is currently in an error state. Specific details of the error should be acquired via the Enterprise Manager or Admin API.

bbj_replication_last_confirmed_serial

Last serial number confirmed by a job target.

bbj_replication_last_timestamp_bad_file_count

Last timestamp number of bad files for a job.

bbj_replication_last_timestamp_copying

Last timestamp number of files being copied.

bbj_replication_last_timestamp_interval_seconds

Last timestamp interval from previous timestamp for a job.

bbj_replication_last_timestamp_op_count

Last timestamp number of operations processed by a job.

bbj_replication_last_timestamp_op_rate

Last timestamp operations per second for a job.

bbj_replication_last_timestamp_ratio

Last timestamp ratio of timestamp time to wall time for a job.

bbj_replication_last_timestamp_seconds

Last timestamp processed by a job.

bbj_replication_last_timestamp_wall_interval_seconds

Last timestamp interval from previous timestamp for a job.

bbj_replication_last_timestamp_wall_seconds

Last timestamp wall time processed by a job.

bbj_replication_last_wait_for_log_seconds

The last time a job waited for the operation log.

bbj_replication_recopy_disabled

0 or 1 if the recopy feature is disabled for the job. Recopy checks the state of file(s) on the source and target to determine if changes have occurred to one or the other that requires the entire file to be recopied from the source to the target to return them to a synchronized state. This will typically only be disabled in cases where a problem exists and is causing a file or files to be recopied over and over.

bbj_replication_synchronous

0 or 1 if the replication job is synchronous. This is almost always 0 and is not likely something most users will find interesting.

SQL Connections

To import dashboards related to monitoring SQL connections, see the section “Importing Existing Dashboards” below. Use dashboard ID: 5417 to import the BBj Interpreters and SQL Activity dashboard.

bbj_sql_active_connections

The bbj_active_sql_connections metric is a gauge that is updated each time an SQL connection is opened or closed. The value of the gauge is always the total, current number of active sql connections on the BBjServices installation.

Example 1

Display the total number of active SQL connections (includes SQLOPEN, JDBC, and ODBC), updated every 5 minutes:

bbj_active_sql_connections[5m]

This graph shows that there are typically 5-12 active connections at any given moment.

Example 2

Another query of interest would be the change in the number of active connections every 5 minutes (i.e. 5 minutes ago there were 120, now there are 137, so we see a value of 17):

increase(bbj_active_sql_connections[5m])

SQL Statements

Dashboards” below. Use dashboard ID: 5417 to import the BBj Interpreters and SQL Activity dashboard.

bbj_sql_statements

The bbj_sql_statements metric is a counter that is incremented each time an SQL statement is executed, labelling the information with the following for easy grouping purposes:

  • database - Name of the database used by the statement.

  • user - User who executed the statement.

  • type - Type of SQL statement such as SELECT, UPDATE, DELETE, etc.

  • success - true if the statement was successful, false if the statement resulted in an error.

Using these labels, one can generate a dashboard showing the number of statements executed on a particular database over the course of time, the number of UPDATE statements executed on each database, statement execution grouped by user, etc.

Example 1

Display the number of statements executed every 5 minutes on the ChileCompany database. Show a graph line for each unique label combination (i.e. a line for {user=jdoe, type=SELECT, success=true} and another line for {user=jdoe, type=UPDATE, success=true} and still another for {user=jsmith, type=SELECT, success=true}:

increase(bbj_sql_statements{database='ChileCompany'}[5m])

This example could potentially become unruly depending on the number of users and statement types executed. A more useful way of using this type of query might be to specify a value for additional labels (i.e. display only the data for user jdoe):

increase(bbj_sql_statements{database='ChileCompany',user='jdoe'}[5m])

Example 2

Display the number of all types of SQL statements run on the ChileCompany database every 5 minutes, grouping all of the users and types together, but include a separate line for success/failure:

sum(increase(bbj_sql_statements{database="ChileCompany"}[5m])) by (database, success)

This is likely more useful as it shows total load on a given database with the only separation being success/failure.

Example 3

A modified version of Example 2 where we view the number of statements executed every 15 seconds for ALL databases, grouping the totals by database and success/failure:

sum(increase(bbj_sql_statements[15s])) by (database, success)

This graph shows that there are typically 80-150 statements executed every 15 seconds on both the ChileCompany and AddonSoftware databases. Note the steep drop at one point which indicates there were no statements executed during that short segment of time.

SQL Stored Procedures

To import dashboards related to monitoring SQL stored procedures, see the section “Importing Existing Dashboards” below. Use dashboard ID: 5417 to import the BBj Interpreters and SQL Activity dashboard.

bbj_sql_stored_procedures

The bbj_sql_stored_procedures metric is a counter that is incremented each time an SQL stored procedure is executed via a CALL statement (whether standalone or embedded in a nested SELECT statement). The information is labelled with the following details for easy grouping purposes:

  • database - Name of the database used by the statement.

  • user - User who executed the statement.

  • sproc - Name of the stored procedure that was executed.

  • success - true if the SPROC execution was successful, false if it resulted in an error.

Using these labels, one can generate a dashboard showing the number of SPROCs executed on a particular database over the course of time, the number of executions of a specific SPROC, execution grouped by user, etc.

Example 1

Display the number of SPROCs executed every 5 minutes on the ChileCompany database. Show a graph line for each unique label combination (i.e. a line for {user=jdoe, sproc=ITEM_DETAIL, success=true} and another line for {user=jdoe, sproc=CUST_DETAIL, success=true}, etc.:

increase(bbj_sql_stored_procedures{database='ChileCompany'}[5m])

This example could potentially become unruly depending on the number of users and SPROCs executed. A more useful way of using this type of query might be to specify a value for additional labels (i.e. display only the data for user jdoe):

increase(bbj_sql_stored_procedures{database='ChileCompany',user='jdoe'}[5m])

Example 2

Display the number of all types of SPROCs executed on the ChileCompany database every 5 minutes, grouping all of the users and SPROCs together, but include a separate line for success/failure:

sum(increase(bbj_sql_statements{database="ChileCompany"}[5m])) by (database, success)

Importing Existing Dashboards

Grafana is the most common visualization tool use with Prometheus. This open-source package provides a powerful, highly-customizable interface for creating dashboards with one or more graphs, tables, and more. Grafana includes an import/export feature making it possible to share dashboards with others using the same metrics. BASIS provides several dashboards to help users get started.

To import a BASIS dashboard:

  1. From the Grafana interface: Add->Create->Import

  2. In the Grafana.com Dashboard field, enter the dashboard ID. Alternatively, visit the BASIS Grafana dashboard site for all available dashboards.

  3. Click in the JSON area which will load the dashboard information.

  4. Select your BBj instance from the dropdown (name is dependent on what was provided during configuration of Grafana).

  5. Click Import.

Note: Grafana.com is a useful (and free) resource to share your dashboards with other BBj developers/users.

Custom Metrics

BBj applications have the ability to export custom metrics through the same mechanism used by the built-in metrics. This makes it possible for programs to provide metrics specific to the features and functionality of the application such as user interactions, feature utilization, logins, etc.

BBj uses the open-source Prometheus JVM Client to instrument the metrics and export them via the BBj Jetty server. This means the developer can directly utilize those classes to instrument their own metrics using embedded Java and/or their own custom Java library. For complete details using the API, see the Prometheus JVM Client readme.

However, there is an easier way to use custom metrics from BBj 18.00+ code utilizing the BBjAPI.

BBjAPI

For complete details on the types of metrics and how they work, please consult the Prometheus documentation. However, BBj implements support for two types of metrics using the BBjAPI object: counters and gauges.

Concepts

Counters and gauges should be created one time, by one and only one BBj process. This is due to the fact that all processes should access the same instance of a counter or gauge. To make this easy to accomplish from BBj programs, the BBjAPI provides registerMetricXXX(), getMetricXXX(), and unregisterMetricXXX() method for each type of metric.

Once a metric is registered (see examples below), the metric, its description, and its values (including labels) will automatically appear at the Jetty endpoint (see EM configuration information above) defined to export the BBj metrics, no additional configuration necessary.

Naming Metrics

Metrics are mapped according to a unique name, typically following the convention of all lower-case characters with underscore separating multiple words in the name, e.g. "my_custom_metric." Note that BASIS uses the “bbj_” prefix when naming internal metrics, so avoid using that prefix to minimize confusion.

Registering and Acquiring Metrics

An application should provide some mechanism for acquiring the current instance of the metric. While programs can directly ask the BBjAPI for a particular metric by name, it may be easier from a code maintenance standpoint to move this logic into a class or function call. However, that would be something for the developer to decide based on their current programming style and best practices. See the examples below for information on registering and acquiring the instance of a metric.

Labels

Labels provide a mechanism for adding additional information to each metric. For example, the bbj_sql_statements counter uses labels to identify the data such as database, user, statement type, and success/failure.

It's important to remember that the more labels, the more system resources are required to manage these metrics. BBjServices keeps all metrics in memory (cleared when restarted) and each additional label means that there will be additional objects required for grouping the various labelled metrics together.

Counters

A counter is a cumulative metric that represents a numerical value that only increases. A counter is typically used to count requests, interactions, operations, tasks, errors, etc. Counters should not be used to export current counts of items when the value can increase or decrease over time (e.g. number of BBj processes). Use gauges in this case. The getMetricCounter() method on BBjAPI returns an io.prometheus.Counter instance. See Prometheus JVM Client readme for complete details on available methods.

Example

use io.prometheus.Counter
REM Obtain the instance of the BBjAPI object
myAPI!=BBjAPI()
REM Somewhere in your code, register the metric
REM Define the array for the label names
declare Counter counter!
REM Register the counter. Optionally leave off the label(s)
counter! = myAPI!.registerMetricCounter(
    "my_counter", 
    "My counter counts things.",
    "some_label", "another_label")
REM Somewhere else in your code, acquire the metric instance
counter! = myAPI!.getMetricCounter("my_counter")
REM Increment the counter. Optionally set the 
REM label values. Note they must be the same order as the labels were defined
REM in the register call.
gauge!.labels("some_value", "another_value").inc()

Gauges

A gauge represents a numerical value that can increase and decrease over time. The getMetricGauge() method on BBjAPI returns an io.prometheus.Gauge instance. See Prometheus JVM Client readme for complete details on available methods.

Example

use io.prometheus.Gauge
REM Obtain the instance of the BBjAPI object
myAPI!=BBjAPI()
REM Somewhere in your code, register the metric
declare Gauge gauge!
REM Register the gauge. Optionally leave off the label(s)
gauge! = myAPI!.registerMetricCounter(
    "my_gauge", 
    "My gauge monitors the changes in things.",
    "some_label", "another_label")
REM Somewhere else in your code, acquire the metric instance
gauge! = myAPI!.getMetricGauge("my_gauge")
REM Set the gauge to the current number being monitored. Optionally set the 
REM label values. Note they must be the same order as the labels were defined
REM in the register call.
gauge!.labels("some_value", "another_value").set(numberOfThings)



______________________________________________________________________________________

Copyright BASIS International Ltd. BBj®, Visual PRO/5®, PRO/5®, and BBx® are registered trademarks.