Review Metrics Support

### Discussed in https://github.com/keycloak/keycloak/discussions/8490

<div type='discussions-op-text'>

<sup>Originally posted by **thomasdarimont** September 27, 2021</sup>
The book &ldquo;Distributed Systems Observability'' by Cindy Sridharan describes logs, distributed tracing and metrics as 
essential telemetry types to monitor an application in production, which are also known as the &ldquo;three pillars of observability&rdquo;.
Currently Keycloak does not provide metrics out of the box and users who want to have metrics need to use extensions like 
the [aerogear keycloak-metrics-spi](https://github.com/aerogear/keycloak-metrics-spi) or implement their own metrics collection based on the smallrye-metrics support provided by the Wildfly and JBoss EAP runtimes. 
It would be very helpful for operations teams if Keycloak had a compelling set of useful metrics built-in.

The goal of this discussion is to shape the metrics part of Keycloak&rsquo;s observability story with focus on Keycloak.X and to compile the foundation for a new metrics design document.

# Metrics

A metrics based monitoring of a Keycloak system could consist of interesting metrics that are relevant for different 
audiences like operations and SRE, as well as product teams.

Some of those metrics provide information about different layers of a system, including:

- [x] Process: CPU, Memory consumption, open file descriptors
- [x] JVM: memory, threads, classloading, metadata (java version)
- [x] Datasources: connection pool stats, metadata (database version)
- [x] HTTP server: request count per path/status code /latency distribution
- [x] JGroups: cluster communication stats
- [x] Infinispan: cache stats
- [ ] ~Integrations: outbound http request count / latency distribution~ -> consider tracing instead to analyze latencies and errors.
- [ ] Server: metrics collection duration, metadata (keycloak version) 
- [x] Keycloak: authentication stats, authorization stats
- [ ] Keycloak: inventory stats

# Keycloak Metrics

The application layer of a Keycloak system can provide many different metrics that could be arranged in a set of 
logical domains. Some of the following metrics might be coarse grained while others could be broken down further 
by additional context data, e.g. realm, error_code,  client_id, authenticator_execution, or protocol.

The following list serves as an example for high-level metrics that could theoretically be provided by Keycloak 
at some point in time.

The metrics listed below are based on [an earlier discussion about a compilation of metrics for Keycloak](https://docs.google.com/spreadsheets/d/1S-Xhsnw3BeYuAB5MtcbJDkjXBbJ73cuNrESxbvOyhKc/edit#gid=0).

## Model Metrics
Represents the system inventory, and denotes how many items of a particular type exist in the system.
This helps to keep an eye on the growth of the system.

Example metrics:
- #Realms
- #Users per Realm
- #Clients per Realm
- #Groups per Realm
- #Scopes per Realm

## Authentication Metrics
Represents authentication activity for users and clients.

Example metrics:
- #Logins
- #Login Errors
- #Logouts
- #Logout Errors
- #Login duration histogram
- #Client Login
- #Client Login Errors
- #Required Action Executions
- #Required Action Errors
- #Unique AuthenticationFlowSequence Executions (Username -> Password -> 2FA vs. Username -> Password)

## Authorization Metrics
Represents Authorization activity collected for the authz services.

Example metrics:
- #Access Requested
- #Access Granted
- #Access Denied

## User Metrics
Represents information about users and their metadata.

Example metrics:
- #Users by realm
- #Users by status blocked / locked / disabled
- #User with missing information (email, phoneNumber, address)
- #User with unverified information (email, phoneNumber, address)
- #Distribution of credentials
- #Groups by realm
- #Consents by client / type
- #New Users in interval (yesterday, last week, last month, last year)

## Client Metrics
Represents information about clients and their metadata.

Example metrics:
- #Clients by realm / protocol  / type / enabled / disabled

## OIDC Protocol Usage Metrics
Usage information about the OIDC protocol

Example metrics:
- #Token Requests
- #Token Request Errors
- #Refreshes
- #Refresh Errors
- #UserInfo Requests
- #UserInfo Request Errors
- #Token Exchanges
- #Token Exchanges Errors
- Token generation duration distribution by token type (by protocol mapper?)
- UserInfo generation duration distribution (by protocol mapper?)

## SAML Protocol Usage Metrics
Usage information about the SAML protocol

Example metrics:
- #AuthnRequests
- #AuthnRequest Errors
- Assertion generation duration distribution (by protocol mapper?)

## Federation Metrics
Information about user federation

Example metrics:
- #User lookups in storage
- #User lookup errors in storage

## Identity Brokering Metrics
Information about Identity Brokering

Example metrics:
- #Brokered user logins
- #Brokered user login errors

## Inbound / Endpoint Metrics
- #Inbound (HTTP) request/response by status / path / protocol
- #Inbound (HTTP) request/response latency distribution

In micrometer those are usually captured by the dimensional metric `http.server.requests{uri=...,status=...,...}`.

## Outbound Metrics
- #Outbound request/response  by status / path / protocol / destination
- #Outbound request/response latency distribution

In micrometer those are usually captured by the dimensional metric `http.client.requests{uri=...,status=...,...}`.

## Instance Metrics
Represents general information and metadata about the server. 
Some of those &ldquo;metrics&rdquo; are just simple gauges with a dummy value that exposes the actual metadata via labels.

Example metrics:
- Server Version
- Enabled features
- Metrics Collection duration
- #Exceptions by realm / exception class / cause


# Metrics Infrastructure
The Wildfly and JBoss EAP based Keycloak / RH SSO distributions use SmallRye metrics for their runtime metrics collection.
However the [Quarkus team recommends using micrometer](https://quarkus.io/blog/micrometer-metrics/) for a while now for custom metric collection. In order to follow this approach we will focus on micrometer based metrics for the new metrics support in Keycloak.X.

OS, Process and JVM based metrics are usually provided by the base metric libraries. 
In our case the micrometer library provides a set of useful JVM and system metrics out of the box: https://micrometer.io/docs/ref/jvm
The [micrometer Keycloak metrics SPI](https://github.com/micrometer-metrics/micrometer-keycloak/blob/master/src/main/java/io/micrometer/keycloak/MeterRegistryHolder.java#L38) provides some additional metrics that could be useful.

## Metrics instrumentation
Keycloak provides several ways to collect metrics synchronously, e.g.: event listeners, JAX-RS / container specific filters and HTTP client interceptors. Metrics that are more expensive to compute could be collected 
asynchronously by a dedicated metrics service that can execute datastore specific queries. 

Collected metrics could either be directly stored in the micrometer metric registry or buffered in an own data structure that periodically releases the metrics into an underlying registry.

Explicitly computed metrics could be represented as `Gauges` that are explicitly updated. 
Counted metrics like number of logins or failed logins could be recorded via `Counters` that are updated via 
event listeners or request filters / interceptors. 
Metrics around HTTP request processing should capture information about the request path, status code
and request durations. Additionally request duration recording should allow to track latency profiles.

Keycloak could provide components that enable metrics collection on multiple levels:
- MetricsEventListener: an event listener could update metrics based on user events or admin events 
- MetricsFilter: a server request / response filter could update request specific metrics
- MetricsInterceptor: a client request / response filter could update request specific metrics
- MetricsCollector: a metrics service could periodically run a compute metrics based on registered computation rules

Those metric components should access a shared metric registry, which holds the metadata and state that is eventually exposed by dedicated metric endpoints.

# Keycloak Metrics

## Initial Metrics Selection
Although many of the metrics mentioned above provide valuable insights about a Keycloak system, we should focus 
on a small initial subset of metrics that are provided out of the box.

## Built-in Metrics
Some core metrics should be built-in to Keycloak and provide some configuration options, like whether the metric is collected at all, or the granularity, e.g. additional tags, labels to add.

## Custom Metrics
Some of the metrics mentioned above could be provided out of the box by Keycloak, however there will be use-cases that can not be foreseen, which require the ability to contribute custom metrics to the system. 
For this Keycloak needs to provide a metrics SPI that enables users to add their own custom metrics.

## Metrics Configuration
We should have a way to let users control which metrics are collected / tracked by Keycloak.
Users should be able to control things like:
- Which metrics to enable?
- Which tags to emit alongside the metric?
- Which context information to include? (e.g. request parts or selected parameters)

## Exposing Metrics
Metrics need to be accessible for metric collection tools like Prometheus or InfluxDB. Those tools usually fetch metrics information from an HTTP endpoint. For this we could either provide one global metrics endpoint for the whole server and all realms or realm specific endpoints that can be consumed by the collectors. This model is supported by Quarkus out of the box via the `/q/metrics` global endpoint. This endpoint could then contain information about the process, jvm, instance, as well as all the Keycloak application metrics.

However in environments where a Keycloak system is shared among multiple different parties, e.g. a collection of realms per tenant model, users might only be allowed to access a subset of the metrics information via realm specific endpoints that provides only metrics for a particular realm. In this case an endpoint like `/auth/realm/$myrealm/metrics` could be used as a realm specific endpoint that only provides the Keycloak application metrics and perhaps a small subset of server metadata.

Note, that it should be possible to protect the endpoints which expose realm metrics.

## Metrics SPI

A metrics SPI should allow users to contribute new metrics to the Keycloak metrics collection.
The registered metrics could hook into the metrics collection infrastructure described above.

# Links
- [Aerogear Metrics SPI](https://github.com/aerogear/keycloak-metrics-spi)
- [Micrometer Keycloak Integeration](https://github.com/micrometer-metrics/micrometer-keycloak)
- [Custom Keycloak Metrics collection example](https://github.com/thomasdarimont/keycloak-project-example/tree/main/keycloak/extensions/src/main/java/com/github/thomasdarimont/keycloak/custom/metrics)

---

# Questions

- Which metrics are important for you? What other metrics would you like to see?
- What questions do you want to solve based on metrics and which metrics would support you here?

</div>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Review Metrics Support #9036

Discussed in #8490

Metrics

Keycloak Metrics

Model Metrics

Authentication Metrics

Authorization Metrics

User Metrics

Client Metrics

OIDC Protocol Usage Metrics

SAML Protocol Usage Metrics

Federation Metrics

Identity Brokering Metrics

Inbound / Endpoint Metrics

Outbound Metrics

Instance Metrics

Metrics Infrastructure

Metrics instrumentation

Keycloak Metrics

Initial Metrics Selection

Built-in Metrics

Custom Metrics

Metrics Configuration

Exposing Metrics

Metrics SPI

Links

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Review Metrics Support #9036

Description

Discussed in #8490

Metrics

Keycloak Metrics

Model Metrics

Authentication Metrics

Authorization Metrics

User Metrics

Client Metrics

OIDC Protocol Usage Metrics

SAML Protocol Usage Metrics

Federation Metrics

Identity Brokering Metrics

Inbound / Endpoint Metrics

Outbound Metrics

Instance Metrics

Metrics Infrastructure

Metrics instrumentation

Keycloak Metrics

Initial Metrics Selection

Built-in Metrics

Custom Metrics

Metrics Configuration

Exposing Metrics

Metrics SPI

Links

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions