Observability
Observability is the ability to understand the internal state of a distributed system solely from its external outputs: logs, metrics, traces — and, more recently, profiles and events.
Observability is the ability to understand the internal state of a distributed system solely from its external outputs: logs, metrics, traces — and, more recently, profiles and events.
It goes beyond simple monitoring (which answers « is the system working? ») to address « why is the system behaving this way, particularly in this case I hadn't anticipated? ».
Reference platforms include Datadog, Grafana (Loki + Prometheus + Tempo), Honeycomb, New Relic, Splunk — and the open standard OpenTelemetry, which has emerged in 2025 as the universal instrumentation layer.
