+
Skip to content

Review Metrics Support #9036

@pedroigor

Description

@pedroigor

Discussed in #8490

Originally posted by thomasdarimont September 27, 2021
The book “Distributed Systems Observability'' by Cindy Sridharan describes logs, distributed tracing and metrics as
essential telemetry types to monitor an application in production, which are also known as the “three pillars of observability”.
Currently Keycloak does not provide metrics out of the box and users who want to have metrics need to use extensions like
the aerogear keycloak-metrics-spi or implement their own metrics collection based on the smallrye-metrics support provided by the Wildfly and JBoss EAP runtimes.
It would be very helpful for operations teams if Keycloak had a compelling set of useful metrics built-in.

The goal of this discussion is to shape the metrics part of Keycloak’s observability story with focus on Keycloak.X and to compile the foundation for a new metrics design document.

Metrics

A metrics based monitoring of a Keycloak system could consist of interesting metrics that are relevant for different
audiences like operations and SRE, as well as product teams.

Some of those metrics provide information about different layers of a system, including:

  • Process: CPU, Memory consumption, open file descriptors
  • JVM: memory, threads, classloading, metadata (java version)
  • Datasources: connection pool stats, metadata (database version)
  • HTTP server: request count per path/status code /latency distribution
  • JGroups: cluster communication stats
  • Infinispan: cache stats
  • Integrations: outbound http request count / latency distribution -> consider tracing instead to analyze latencies and errors.
  • Server: metrics collection duration, metadata (keycloak version)
  • Keycloak: authentication stats, authorization stats
  • Keycloak: inventory stats

Keycloak Metrics

The application layer of a Keycloak system can provide many different metrics that could be arranged in a set of
logical domains. Some of the following metrics might be coarse grained while others could be broken down further
by additional context data, e.g. realm, error_code, client_id, authenticator_execution, or protocol.

The following list serves as an example for high-level metrics that could theoretically be provided by Keycloak
at some point in time.

The metrics listed below are based on an earlier discussion about a compilation of metrics for Keycloak.

Model Metrics

Represents the system inventory, and denotes how many items of a particular type exist in the system.
This helps to keep an eye on the growth of the system.

Example metrics:

  • #Realms
  • #Users per Realm
  • #Clients per Realm
  • #Groups per Realm
  • #Scopes per Realm

Authentication Metrics

Represents authentication activity for users and clients.

Example metrics:

  • #Logins
  • #Login Errors
  • #Logouts
  • #Logout Errors
  • #Login duration histogram
  • #Client Login
  • #Client Login Errors
  • #Required Action Executions
  • #Required Action Errors
  • #Unique AuthenticationFlowSequence Executions (Username -> Password -> 2FA vs. Username -> Password)

Authorization Metrics

Represents Authorization activity collected for the authz services.

Example metrics:

  • #Access Requested
  • #Access Granted
  • #Access Denied

User Metrics

Represents information about users and their metadata.

Example metrics:

  • #Users by realm
  • #Users by status blocked / locked / disabled
  • #User with missing information (email, phoneNumber, address)
  • #User with unverified information (email, phoneNumber, address)
  • #Distribution of credentials
  • #Groups by realm
  • #Consents by client / type
  • #New Users in interval (yesterday, last week, last month, last year)

Client Metrics

Represents information about clients and their metadata.

Example metrics:

  • #Clients by realm / protocol / type / enabled / disabled

OIDC Protocol Usage Metrics

Usage information about the OIDC protocol

Example metrics:

  • #Token Requests
  • #Token Request Errors
  • #Refreshes
  • #Refresh Errors
  • #UserInfo Requests
  • #UserInfo Request Errors
  • #Token Exchanges
  • #Token Exchanges Errors
  • Token generation duration distribution by token type (by protocol mapper?)
  • UserInfo generation duration distribution (by protocol mapper?)

SAML Protocol Usage Metrics

Usage information about the SAML protocol

Example metrics:

  • #AuthnRequests
  • #AuthnRequest Errors
  • Assertion generation duration distribution (by protocol mapper?)

Federation Metrics

Information about user federation

Example metrics:

  • #User lookups in storage
  • #User lookup errors in storage

Identity Brokering Metrics

Information about Identity Brokering

Example metrics:

  • #Brokered user logins
  • #Brokered user login errors

Inbound / Endpoint Metrics

  • #Inbound (HTTP) request/response by status / path / protocol
  • #Inbound (HTTP) request/response latency distribution

In micrometer those are usually captured by the dimensional metric http.server.requests{uri=...,status=...,...}.

Outbound Metrics

  • #Outbound request/response by status / path / protocol / destination
  • #Outbound request/response latency distribution

In micrometer those are usually captured by the dimensional metric http.client.requests{uri=...,status=...,...}.

Instance Metrics

Represents general information and metadata about the server.
Some of those “metrics” are just simple gauges with a dummy value that exposes the actual metadata via labels.

Example metrics:

  • Server Version
  • Enabled features
  • Metrics Collection duration
  • #Exceptions by realm / exception class / cause

Metrics Infrastructure

The Wildfly and JBoss EAP based Keycloak / RH SSO distributions use SmallRye metrics for their runtime metrics collection.
However the Quarkus team recommends using micrometer for a while now for custom metric collection. In order to follow this approach we will focus on micrometer based metrics for the new metrics support in Keycloak.X.

OS, Process and JVM based metrics are usually provided by the base metric libraries.
In our case the micrometer library provides a set of useful JVM and system metrics out of the box: https://micrometer.io/docs/ref/jvm
The micrometer Keycloak metrics SPI provides some additional metrics that could be useful.

Metrics instrumentation

Keycloak provides several ways to collect metrics synchronously, e.g.: event listeners, JAX-RS / container specific filters and HTTP client interceptors. Metrics that are more expensive to compute could be collected
asynchronously by a dedicated metrics service that can execute datastore specific queries.

Collected metrics could either be directly stored in the micrometer metric registry or buffered in an own data structure that periodically releases the metrics into an underlying registry.

Explicitly computed metrics could be represented as Gauges that are explicitly updated.
Counted metrics like number of logins or failed logins could be recorded via Counters that are updated via
event listeners or request filters / interceptors.
Metrics around HTTP request processing should capture information about the request path, status code
and request durations. Additionally request duration recording should allow to track latency profiles.

Keycloak could provide components that enable metrics collection on multiple levels:

  • MetricsEventListener: an event listener could update metrics based on user events or admin events
  • MetricsFilter: a server request / response filter could update request specific metrics
  • MetricsInterceptor: a client request / response filter could update request specific metrics
  • MetricsCollector: a metrics service could periodically run a compute metrics based on registered computation rules

Those metric components should access a shared metric registry, which holds the metadata and state that is eventually exposed by dedicated metric endpoints.

Keycloak Metrics

Initial Metrics Selection

Although many of the metrics mentioned above provide valuable insights about a Keycloak system, we should focus
on a small initial subset of metrics that are provided out of the box.

Built-in Metrics

Some core metrics should be built-in to Keycloak and provide some configuration options, like whether the metric is collected at all, or the granularity, e.g. additional tags, labels to add.

Custom Metrics

Some of the metrics mentioned above could be provided out of the box by Keycloak, however there will be use-cases that can not be foreseen, which require the ability to contribute custom metrics to the system.
For this Keycloak needs to provide a metrics SPI that enables users to add their own custom metrics.

Metrics Configuration

We should have a way to let users control which metrics are collected / tracked by Keycloak.
Users should be able to control things like:

  • Which metrics to enable?
  • Which tags to emit alongside the metric?
  • Which context information to include? (e.g. request parts or selected parameters)

Exposing Metrics

Metrics need to be accessible for metric collection tools like Prometheus or InfluxDB. Those tools usually fetch metrics information from an HTTP endpoint. For this we could either provide one global metrics endpoint for the whole server and all realms or realm specific endpoints that can be consumed by the collectors. This model is supported by Quarkus out of the box via the /q/metrics global endpoint. This endpoint could then contain information about the process, jvm, instance, as well as all the Keycloak application metrics.

However in environments where a Keycloak system is shared among multiple different parties, e.g. a collection of realms per tenant model, users might only be allowed to access a subset of the metrics information via realm specific endpoints that provides only metrics for a particular realm. In this case an endpoint like /auth/realm/$myrealm/metrics could be used as a realm specific endpoint that only provides the Keycloak application metrics and perhaps a small subset of server metadata.

Note, that it should be possible to protect the endpoints which expose realm metrics.

Metrics SPI

A metrics SPI should allow users to contribute new metrics to the Keycloak metrics collection.
The registered metrics could hook into the metrics collection infrastructure described above.

Links


Questions

  • Which metrics are important for you? What other metrics would you like to see?
  • What questions do you want to solve based on metrics and which metrics would support you here?

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载