Build Predictive Machine Learning with Flink | Workshop on Dec 18 | Register Now

What is Observability?

Observability is the ability to measure the internal states of a system by examining its outputs. Whether through application performance monitoring (APM), telemetry data, log analytics, traces, or metrics, the more real time insights you have into your system, the more quickly you can pinpoint performance problems and mitigate risks.

What problems does Observability solve?

manifesto

From Observability: A Manifesto by Charity Majors:

“Observability is about getting the right information at the right time into the hands of the people who have the ability and responsibility to do the right thing. Helping them make better technical and business decisions driven by real data, not guesses, or hunches, or shots in the dark. Time is the most precious resource you have — your own time, your engineering team’s time, your company’s time.”

Observability is important because it allows you to spot bottlenecks, resolve outages, and glean valuable insights about how your software behaves.

Telemetry: Correlating Metrics, Traces, and Logs

It is possible to observe complex distributed systems by correlating telemetry data – traces, metrics, and logs.

Traces

  • A span is an individual unit of work done in a distributed system
  • As a request kicks off a chain of execution across many distributed systems, those spans are collected into a trace
  • Traces work by propagating a unique trace ID throughout the system using headers
  • Apps must be instrumented to send trace information to a backend observability service for analysis

Here is a simple example where you can observe the flow of execution that happens when you call a method called requestStarted, where the entire trace is broken down into its constituent spans:

With this information, it becomes possible to find bugs, isolate performance bottlenecks, or set intelligent alerts.

trace_image

Metrics

  • Metrics are measurements of application performance (e.g. memory usage)
  • Apps must be instrumented to send metrics information to a backend observability service for analysis

Logs

  • Logs are a history of events for your application
  • Logs give information that is critical for debugging
  • Apps must be instrumented to send logs to a backend observability service for analysis

What is OpenTelemetry?

OpenTelemetry is a Cloud Native Computing Foundation (CNCF) project dedicated to creating an open standard for application telemetry instrumentation. OpenTelemetry is emerging as the preeminent telemetry protocol in the observability industry.

  • OpenTelemtry Protocol (OTLP) provides a standard way to communicate about metrics and traces
    • NOTE: As of writing, OTLP's support for logs is considered experimental. See OpenTelemtry Project Status to see the current status of the project. It is recommended to use other logging solutions (e.g. Elastic Filebeat or Loki) in conjunction with OpenTelemetry for full observability.
  • API -- a specification for public interfaces to be used by libraries to instrument apps to expose telemetry data
  • SDK -- actual implementations of the API for different programming languages
  • Use SDK libraries to manually instrument your app code to expose metrics and traces over OTLP to different backend analytics systems
  • Collector -- the OTEL collector imports and exports telemetry data with different protocols

What is Data Observability?

Logs, metrics, and traces pertain to observability as it relates to application performance monitoring (APM). However, businesses are also highly interested in observing how business data flows end-to-end. This is called data observability and is often spoken about in the context of “data governance”.

For example, Confluent Cloud offers a powerful Stream Lineage interface to observe data as it flows throughout a business.

The purpose of this lab is to explore a working example of how OpenTelemetry enables metrics and traces in Java using the OpenTelemetry Java agent. What is nice about the Java agent is that it automatically sends telemetry data by simply instantiating Meter and Tracer objects and setting some environment variables.

There is a SpringBoot Java application that exposes an endpoint at http://localhost:8888/hello. App metrics and request traces are sent via OTLP (OpenTelemetry Protocol) to an observability backend (Elastic Observability APM in this case).

Launch the lab environment by clicking [https://gitpod.io/#https://github.com/riferrei/otel-with-java[^]](https://gitpod.io/#https://github.com/riferrei/otel-with-java[^]). ** On launch, all services are built and started with docker-compose.

Inspect the source code of the HelloApp Java application. Specifically, look at src/main/java/riferrei/otel/java/HelloAppController.java. This is where OpenTelemetry tracing and custom metrics are implemented.

Send GET requests to the Hello app.

[source,bash]

----

curl http://localhost:8888/hello

----

Repeat the previous curl command several times to the /hello endpoint as well as others (other endpoints are expected to result in error responses).

Execute the following echo command and Ctrl+Click the resulting URL to open the traces for the hello-app in the Kibana UI.

[source,bash]

----

echo https://5601-${GITPOD_WORKSPACE_URL#https://}/app/apm/services/hello-app/transactions

----

NOTE: The URL will look something like https://5601-aquamarine-python-rsq28cwb.ws-us17.gitpod.io/app/apm/services/hello-app/transactions

Scroll down to the bottom of the page and select the /hello endpoint from the Transactions section.

Scroll down again to see the trace sampling, which shows latency measurements at various stages of execution. + TIP: These trace samples are a great tool for understanding what is happening in a transaction. In more complex applications, this trace sample would show the flow of execution across many microservices, helping you to identify bugs and performance bottlenecks much more quickly.

Execute the following echo command and Ctrl+Click the resulting URL to open the "discover" area of the Kibana UI, where OpenTelemetry metrics will be automatically discovered.

[source,bash]


echo https://5601-${GITPOD_WORKSPACE_URL#https://}/app/discover


NOTE: The URL will look something like https://5601-aquamarine-python-rsq28cwb.ws-us17.gitpod.io/app/discover. Ignore warnings.

Investigate the custom.metric.heap.memory and custom.metric.number.of.exec, which are the custom metrics defined in Constants.java and HelloAppController.java.

NOTE: This lab comes from [https://github.com/riferrei/otel-with-java[^]](https://github.com/riferrei/otel-with-java[^]), created by Ricardo Ferreira. There is a sibling repository at [https://github.com/riferrei/otel-with-golang[^]](https://github.com/riferrei/otel-with-golang[^]). The main difference between the Java and Go implementations is that the OpenTelemety Java agent creates trace spans automatically while Golang requires more manual instrumentation. There is an associated in-depth video walkthrough from SREcon 2021.

In this lab, you explored how to instrument a Java application using OpenTelemetry and send those app metrics and request traces to an observability backend for analysis.

From the original creators of Apache Kafka, learn why Confluent’s data streaming technologies are used by 70% of the Fortune 100. Build real-time data pipelines, unlock real-time data governance, and stream data from infinite souces for seamless data observability, monitoring, and metrics on any cloud.

More Resources