GRIT Monitoring Setup: Prometheus + Loki + Grafana + Spring Actuator

Don-zo/GRITRepository for GRIT, a real-time video study platform

While building GRIT, I felt that implementing features alone did not cover operations. GRIT is a real-time video study service, so if something goes wrong while users are in a room, they can notice it immediately. I needed at least a basic place to look when requests increased or errors happened.

At first, I thought checking logs would be enough. But values like response time, request count, and error rate are easier to understand as metrics than as logs. On the other hand, detailed information about a specific problem still requires logs.

So I set up basic monitoring with Spring Actuator, Prometheus, Loki, and Grafana.

Problem

GRIT had real-time service characteristics. Users enter a room, receive microphone permission, and share timer and presentation state. These features require more runtime visibility than a simple CRUD service.

The things I especially wanted to check were:

API request count and response time
Error trend
Logs from a specific time range
Application status and JVM metrics
A basic screen to check after deployment when something goes wrong

Without this preparation, I would have to start by digging through logs one by one whenever a problem happened. I wanted to reduce that situation.

Approach

In a Spring Boot application, Actuator can expose basic health and metrics. So I first opened the Actuator endpoints and exposed metrics in a form that Prometheus could scrape.

I did not want to open every endpoint without thinking. I had to decide which endpoints were actually needed for health checks and metric collection, and limit the exposed range based on that.

Implementation

Prometheus was configured to scrape the application's metrics endpoint periodically. Grafana used Prometheus as a data source to show request count, response time, and JVM-related metrics.

I did not try to build a perfect dashboard from the start. I focused first on deciding the minimum set of metrics I needed to check.

Is the service alive?
Are requests coming in?
Is response time increasing suddenly?
Are errors increasing?
Are JVM memory or thread values abnormal?

Even this level of dashboard gave me a first screen to check when something went wrong.

Connecting logs

Metrics show abnormal signals, but logs are often needed to understand why they happened. So I introduced Loki and made application logs searchable from Grafana as well.

If logs are only checked from files or the console, it is inconvenient to follow them by time range or service. Being able to move from a metric graph to logs from the same time range in Grafana helps narrow down problems.

The purpose of this setup was not just storing logs. It was to see metrics and logs together in the same place.

Checkpoints

While doing this work, I felt that monitoring should not be treated as something added only after feature development. Checking whether a feature works and understanding what is abnormal while it is running are different problems.

The points I checked were:

Can I see the service status at a glance?
Can I see request count and response time as numbers?
Can I quickly find logs from the time when a problem happened?
Do I have a minimum check routine after deployment?

Takeaway

The GRIT monitoring setup was not a huge observability platform. But it helped me start thinking about operation.

By setting up Spring Actuator, Prometheus, Loki, and Grafana, I became more aware of what should be checked after implementing a feature. For a real-time service, it was useful to know where to start when something goes wrong.