grPC Health Checks on Kubernetes with Spring Boot Actuator

Introduction

Over the last few years, we have seen more and more projects and companies adopt gRPC as the communication protocol among internal microservices, or even for customer-facing services. gRPC can be implemented in many languages, Java being one of them, and when it comes to Java web frameworks, Spring Boot is arguably the most popular. On the deployment side of things, Kubernetes has emerged as the undisputed leader. Thus, it is only fair that we talk about all three of them when we are looking to make a production-ready application.

From now on, I will refer to Spring Boot simply as Boot, and Kubernetes as K8S. Also, I will use “gRPC service” and “gRPC server” interchangeably unless there is a specific need for disambiguation.

I deliberately avoided providing supporting information about the popularity claims made above, but if you are the “trust but verify” type, feel free to convince yourself by doing some research.

Problem Statement

We want to deploy a gRPC client-server application in K8S with no web (HTTP/S) interface, and we want to monitor the application health.

Design

Let’s first take stock of where each member of the trio stands when it comes to monitoring application health:

So, a reasonable strategy is to implement the gRPC health checking protocol, make it available under the Boot Actuator health endpoint, and then use the K8S probes to query the Actuator health endpoint time to time.

Implementation

gRPC

Good thing is, io.grpc:grpc-services comes with an implementation of the Health service that implements the “GRPC Health Checking Protocol”; it is the class HealthServiceImpl and can be retrieved by a call to HealthStatusManager.getHealthService(). However, registering that service with the gRPC server is our responsibility.

For monitoring a gRPC client readiness, we can watch the ManagedChannel state using the methods ManagedChannel.getState() and ManagedChannel.notifyWhenStateChanged() methods. If the state is ConnectivityState.TRANSIENT_FAILURE when checked, there has been some transient failure, and the gRPC client app is alive but not accepting requests (not ready).

Implementing a gRPC client liveness check is up to the application.

Boot

As of this writing, Spring Boot does not have out of the box support for gRPC (little surprising, since they seem to support everything else under the sun). A Google search brings up two libraries that aim to fill this gap, yidongnan/grpc-spring-boot-starter being one of them. If property grpc.server.health-service-enabled is true (default), it registers the Health service automatically for us. So, in order to make the server health status available under Actuator health endpoint, we just need to implement a HealthIndicator named GrpcServerHealthIndicator that is a client of the Health service. See Writing Custom HealthIndicators.

Starting with version 2.3.0.RELEASE, Spring Boot provides liveness and readiness information under Actuator health endpoint. See Kubernetes Probes for details. In order to include our GrpcServerHealthIndicator in liveness and readiness groups, see Checking external state with Kubernetes Probes.

For example, to add to the readiness health group:

management.endpoint.health.group.readiness.include=readinessState,grpcServer

A Boot application running on Kubernetes will show the following health report:

/actuator/health

{
  "status": "UP",
  "components": {
    "diskSpace": {
      "status": "UP",
      "details": { //...
      }
    },
    "livenessProbe": {
      "status": "UP"
    },
    "ping": {
      "status": "UP"
    },
    "readinessProbe": {
      "status": "UP"
    }
  },
  "groups": [
    "liveness",
    "readiness"
  ]
}

and

/actuator/health/liveness

{
  "status": "UP",
  "components": {
    "livenessProbe": {
      "status": "UP"
    }
  }
}

The problem, though, is that our application does not have any web interface. So, what do we do?

Pretend you are thinking hard about a solution before reading on.

The solution is to use Actuator monitoring over JMX.

Following is the JMX counterpart to the HTTP /actuator/health output above (line breaks and indentation for legibility only):

{
  status=UP,
  components={
    diskSpace={status=UP, details={...}},
    livenessState={status=UP},
    ping={status=UP},
    readinessState={status=UP}
  },
  groups=[liveness, readiness]
}

Note that the K8S probes are not mandatory, so, if you are using an older version of Boot that does not support those, fret not. The GrpcServerHealthIndicator can still contribute to the overall health status available in the top-level status, it just will not contribute to the liveness and readiness health groups.

K8S

K8S defines two distinct checks: Liveness and readiness. However, gRPC only defines a single health checking protocol and does not have a native concept of readiness check. A reasonable way to map gRPC responses to Kubernetes checks way is interpreting SERVING response as the service being alive and ready to accept more requests, NOT SERVING response as the service being alive but not accepting requests, and UNKNOWN or failure to respond as the service not being alive.

We have already discussed mapping gRPC client health to Kubernetes checks in this section.

JMX

To monitor a JVM using the JMX API, we must enable the JMX agent when starting the JVM. We can enable the JMX agent for local monitoring, for a client management application running on the local system, or for remote monitoring, for a client management application running on a remote system. The steps involved vary, and can be quite involved. See Monitoring and Management Using JMX Technology for details. Apparently it is not very well understood, as indicated by the plethora of Stack Overflow questions on this topic:

For the problem at hand, we will use a much simpler approach. Note that the K8S probes execute on the same host as the application, so we do not need to connect to the app remotely. Instead, we will find the process we want to attach to, just like we do when running JConsole, using the jcmd tool introduced in Java SE 7. It is available under $JAVA_HOME/bin, same place as the java binary.

Assuming we have started our gRPC app using the following command:

java -cp /myapp.jar com.mycompany.myapp.MainClass

The output of running jcmd will show (with different PIDs) the following:

87864 jdk.jcmd/sun.tools.jcmd.JCmd
87785 com.mycompany.myapp.MainClass

We use the ProcessBuilder class to run the jcmd command. We then parse the output as shown above and find the PID for the JVM process matching the given main class name. Following is a Kotlin code snippet showing how to do it:

val pid = output
  .map { it.split("\\s+".toRegex()) }
  .filter { it.size > 1 && it[1].endsWith(mainClass) }
  .map { it.first() }
  .firstOrNull() ?: throw IllegalArgumentException("Couldn't find process matching: $mainClass")

That is all we need to attach to that process using the Attach API that was introduced in Java SE 6. Our JMX client attaches to the VirtualMachine corresponding to the PID, and starts the JMX management agent in the target process if not already running. Kotlin code snippet again:

val connectorAddress = vm.agentProperties.getProperty("com.sun.management.jmxremote.localConnectorAddress") ?: vm.startLocalManagementAgent()
val url = JMXServiceURL(connectorAddress)
val connection = JMXConnectorFactory.connect(url).mBeanServerConnection

It then invokes the health operation on the Spring Boot Health MBean.

val health = connection
  .invoke("org.springframework.boot:type=Endpoint,name=Health", "health", null, null)
  .toString()

Finally, it parses the result, and checks the status of the livenessState or readinessState. If Kubernetes probes are not available/enabled, it checks the top-level status. If the status is not UP, it throws an exception causing the client to exit with an error. The parsing logic can be implemented in various ways, and I leave it to the reader to choose their own.

Assuming the JMX client is a command line app that accepts option -m for the main class, and options -l and -r for running the liveness and readiness checks, respectively, we can set up the K8S liveness probe as follows:

livenessProbe:
  exec:
    command:
    - health-probe
    - -l
    - -m
    - MainClass
  initialDelaySeconds: 5
  periodSeconds: 5

The readiness probe is almost identical.

We assume above that health-probe is a binary executable available on the server PATH. There are various ways to distribute a Java app with binaries that we will not discuss here; if using Gradle as a build tool, the Application plugin can do it.

Obviously, the JMX client distribution has to be included in the server Docker image for the K8S probes to work.

Conclusion

And there we have it, a complete recipe for grPC health checks on Kubernetes with Spring Boot Actuator.

Comments