Non Compos Mentis

Scala Curated

2023-12-18T00:00:00+00:00

This is a curated list of various Scala resources.

Blogs & Resources

Scala Courses & Projects

Language

New in Scala 3

Type Classes

Testing

Python Curated

2023-08-23T00:00:00+00:00

This is a curated list of various Python resources.

Language

Parser

System Design Curated

2023-08-18T00:00:00+00:00

This is a curated list of various system design interview resources.

General

Specific

Visiblity

Monitoring and Alerting Best Practices

JVM

Haskell Curated

2023-01-20T00:00:00+00:00

This is a curated list of various Haskell resources.

Haskell Courses & Projects

Tools

Language

General

Math

Monads

Monad Transformers

Monadic Parsing

Arrows

Arrays

Arrays in Haskell

Lists

Zippers

Lenses

Trees & Graphs

Logic & Continuations

Date/Time

Concurrency/Parallelism

Testing

Signing Artifacts for Publishing to Maven Central Repository

2021-04-04T00:00:00+00:00

Introduction

If you are the author or maintainer of an JVM-based Open Source software, chances are you publish it to the Maven Central repository so that automated build tools can download it. With the recent decommissioning of JCenter and Bintray, Maven Central is pretty much the only game in town.

Problem Statement

Artifacts to be published to Maven Central must meet certain requirements, one of which is accompanying PGP signatures. Users of your library might want to verify these artifacts’ PGP signatures against a public key server. Several plugins are available for popular build tools like Maven, Gradle and SBT that attempt to make the publishing process less painful, but as far as I am aware, they all leave the signing part up to the user. In this article, I will lay out the exact steps needed for generating a PGP key pair, and a Gradle example of using them to sign the artifacts. Having generated the keys, they can be used by any build tool.

A PGP signature can also be used for signing commits and tags on GitHub.

Requirements

The only thing you will need other than a computer and an internet connection is Docker engine installed. For Windows and Mac, you can install Docker Desktop.

Steps

Run an Ubuntu container.

docker run --rm -it -v $(pwd)/.gnupg:/root/.gnupg -e GPG_TTY=/dev/console ubuntu bash

Install GnuPG, which is an implementation of the Open PGP standard.
```
apt update && apt install -y gnupg
```
Generate a GPG key pair. Because of the Docker volume mapping, the generated keys are stored in the .gnupg directory in your home directory.
```
gpg --full-generate-key
```
At the prompt, specify the kind of key you want, or press Enter to accept the default RSA and RSA.
Enter the desired key size. I recommend at least 4096 bits.
Enter the length of time the key should be valid. Press Enter to accept the default selection, indicating that the key does not expire.
Confirm that the key does not expire by entering ‘y’.
Enter your full name.
Enter your email address.
Enter a comment, or press Enter to skip.
Verify that your selections are correct. Enter ‘O’ (Upper case O) to proceed.
Type a secure passphrase. If you enter a weak passphrase, you will be asked to confirm it.
Retype the passphrase.
Use the following command to list GPG keys for which you have both a public and private key.
```
gpg --list-secret-keys --keyid-format SHORT
```
The last few lines will be similar to the following. The key id/fingerprint is the 40-character string on the second line of sec. The last 8 characters (shown on the first line after rsa3072, 9D397642 in the example below) are sufficient to uniquely identify the key, and is referred to as the in the following steps.
```
sec   rsa3072/9D397642 2021-04-04 [SC]
  C49D1269413357A989292F0BBED9F87D9D397642
uid         [ultimate] John Doe 
ssb   rsa3072/AD77D767 2021-04-04 [E]
```
Upload the public key to a key server; Maven central uses hkp://keyserver.ubuntu.com.
```
gpg --keyserver hkp://keyserver.ubuntu.com --send-keys 
```

Optionally, convert the secret key to a Base64 encoded format with no line breaks. This is required because a lot of public CI servers will allow the secret key to be set as an environment variable, and the line breaks and special characters in the secret key are not compatible with an environment variable.

gpg --armor --export-secret-keys  | base64 -w0

If you are using Gradle, you can use the Signing plugin along with the keys to sign the artifacts, as shown below (using Kotlin DSL):

fun base64Decode(prop: String): String? {
    return project.findProperty(prop)?.let {
        String(Base64.getDecoder().decode(it.toString())).trim()
    }
}

signing {
    useInMemoryPgpKeys(base64Decode("signingKey"), base64Decode("signingPassword"))
    sign(*publishing.publications.toTypedArray())
}

signingKey and signingPassword are project properties that are set from the corresponding environment properties.

Conclusion

And there we have it, a complete recipe for generating a PGP key pair, and a Gradle example of using them to sign the artifacts.

grPC Health Checks on Kubernetes with Spring Boot Actuator

2020-09-27T00:00:00+00:00

Introduction

Over the last few years, we have seen more and more projects and companies adopt gRPC as the communication protocol among internal microservices, or even for customer-facing services. gRPC can be implemented in many languages, Java being one of them, and when it comes to Java web frameworks, Spring Boot is arguably the most popular. On the deployment side of things, Kubernetes has emerged as the undisputed leader. Thus, it is only fair that we talk about all three of them when we are looking to make a production-ready application.

From now on, I will refer to Spring Boot simply as Boot, and Kubernetes as K8S. Also, I will use “gRPC service” and “gRPC server” interchangeably unless there is a specific need for disambiguation.

I deliberately avoided providing supporting information about the popularity claims made above, but if you are the “trust but verify” type, feel free to convince yourself by doing some research.

Problem Statement

We want to deploy a gRPC client-server application in K8S with no web (HTTP/S) interface, and we want to monitor the application health.

Design

Let’s first take stock of where each member of the trio stands when it comes to monitoring application health:

gRPC defines a health checking protocol for the server. It does not define a corresponding protocol for the client.
K8S defines liveness and readiness probes.
Boot provides Actuator, and Actuator provides a Health endpoint.

So, a reasonable strategy is to implement the gRPC health checking protocol, make it available under the Boot Actuator health endpoint, and then use the K8S probes to query the Actuator health endpoint time to time.

Implementation

gRPC

Good thing is, io.grpc:grpc-services comes with an implementation of the Health service that implements the “GRPC Health Checking Protocol”; it is the class HealthServiceImpl and can be retrieved by a call to HealthStatusManager.getHealthService(). However, registering that service with the gRPC server is our responsibility.

For monitoring a gRPC client readiness, we can watch the ManagedChannel state using the methods ManagedChannel.getState() and ManagedChannel.notifyWhenStateChanged() methods. If the state is ConnectivityState.TRANSIENT_FAILURE when checked, there has been some transient failure, and the gRPC client app is alive but not accepting requests (not ready).

Implementing a gRPC client liveness check is up to the application.

Boot

As of this writing, Spring Boot does not have out of the box support for gRPC (little surprising, since they seem to support everything else under the sun). A Google search brings up two libraries that aim to fill this gap, yidongnan/grpc-spring-boot-starter being one of them. If property grpc.server.health-service-enabled is true (default), it registers the Health service automatically for us. So, in order to make the server health status available under Actuator health endpoint, we just need to implement a HealthIndicator named GrpcServerHealthIndicator that is a client of the Health service. See Writing Custom HealthIndicators.

Starting with version 2.3.0.RELEASE, Spring Boot provides liveness and readiness information under Actuator health endpoint. See Kubernetes Probes for details. In order to include our GrpcServerHealthIndicator in liveness and readiness groups, see Checking external state with Kubernetes Probes.

For example, to add to the readiness health group:

management.endpoint.health.group.readiness.include=readinessState,grpcServer

A Boot application running on Kubernetes will show the following health report:

/actuator/health

{
  "status": "UP",
  "components": {
    "diskSpace": {
      "status": "UP",
      "details": { //...
      }
    },
    "livenessProbe": {
      "status": "UP"
    },
    "ping": {
      "status": "UP"
    },
    "readinessProbe": {
      "status": "UP"
    }
  },
  "groups": [
    "liveness",
    "readiness"
  ]
}

and

/actuator/health/liveness

{
  "status": "UP",
  "components": {
    "livenessProbe": {
      "status": "UP"
    }
  }
}

The problem, though, is that our application does not have any web interface. So, what do we do?

Pretend you are thinking hard about a solution before reading on.

The solution is to use Actuator monitoring over JMX.

Following is the JMX counterpart to the HTTP /actuator/health output above (line breaks and indentation for legibility only):

{
  status=UP,
  components={
    diskSpace={status=UP, details={...}},
    livenessState={status=UP},
    ping={status=UP},
    readinessState={status=UP}
  },
  groups=[liveness, readiness]
}

Note that the K8S probes are not mandatory, so, if you are using an older version of Boot that does not support those, fret not. The GrpcServerHealthIndicator can still contribute to the overall health status available in the top-level status, it just will not contribute to the liveness and readiness health groups.

K8S

K8S defines two distinct checks: Liveness and readiness. However, gRPC only defines a single health checking protocol and does not have a native concept of readiness check. A reasonable way to map gRPC responses to Kubernetes checks way is interpreting SERVING response as the service being alive and ready to accept more requests, NOT SERVING response as the service being alive but not accepting requests, and UNKNOWN or failure to respond as the service not being alive.

We have already discussed mapping gRPC client health to Kubernetes checks in this section.

JMX

To monitor a JVM using the JMX API, we must enable the JMX agent when starting the JVM. We can enable the JMX agent for local monitoring, for a client management application running on the local system, or for remote monitoring, for a client management application running on a remote system. The steps involved vary, and can be quite involved. See Monitoring and Management Using JMX Technology for details. Apparently it is not very well understood, as indicated by the plethora of Stack Overflow questions on this topic:

For the problem at hand, we will use a much simpler approach. Note that the K8S probes execute on the same host as the application, so we do not need to connect to the app remotely. Instead, we will find the process we want to attach to, just like we do when running JConsole, using the jcmd tool introduced in Java SE 7. It is available under $JAVA_HOME/bin, same place as the java binary.

Assuming we have started our gRPC app using the following command:

java -cp /myapp.jar com.mycompany.myapp.MainClass

The output of running jcmd will show (with different PIDs) the following:

87864 jdk.jcmd/sun.tools.jcmd.JCmd
87785 com.mycompany.myapp.MainClass

We use the ProcessBuilder class to run the jcmd command. We then parse the output as shown above and find the PID for the JVM process matching the given main class name. Following is a Kotlin code snippet showing how to do it:

val pid = output
  .map { it.split("\\s+".toRegex()) }
  .filter { it.size > 1 && it[1].endsWith(mainClass) }
  .map { it.first() }
  .firstOrNull() ?: throw IllegalArgumentException("Couldn't find process matching: $mainClass")

That is all we need to attach to that process using the Attach API that was introduced in Java SE 6. Our JMX client attaches to the VirtualMachine corresponding to the PID, and starts the JMX management agent in the target process if not already running. Kotlin code snippet again:

val connectorAddress = vm.agentProperties.getProperty("com.sun.management.jmxremote.localConnectorAddress") ?: vm.startLocalManagementAgent()
val url = JMXServiceURL(connectorAddress)
val connection = JMXConnectorFactory.connect(url).mBeanServerConnection

It then invokes the health operation on the Spring Boot Health MBean.

val health = connection
  .invoke("org.springframework.boot:type=Endpoint,name=Health", "health", null, null)
  .toString()

Finally, it parses the result, and checks the status of the livenessState or readinessState. If Kubernetes probes are not available/enabled, it checks the top-level status. If the status is not UP, it throws an exception causing the client to exit with an error. The parsing logic can be implemented in various ways, and I leave it to the reader to choose their own.

Assuming the JMX client is a command line app that accepts option -m for the main class, and options -l and -r for running the liveness and readiness checks, respectively, we can set up the K8S liveness probe as follows:

livenessProbe:
  exec:
    command:
    - health-probe
    - -l
    - -m
    - MainClass
  initialDelaySeconds: 5
  periodSeconds: 5

The readiness probe is almost identical.

We assume above that health-probe is a binary executable available on the server PATH. There are various ways to distribute a Java app with binaries that we will not discuss here; if using Gradle as a build tool, the Application plugin can do it.

Obviously, the JMX client distribution has to be included in the server Docker image for the K8S probes to work.

Conclusion

And there we have it, a complete recipe for grPC health checks on Kubernetes with Spring Boot Actuator.

Algorithms Curated

2019-01-01T00:00:00+00:00

This is a curated list of various algorithms and coding interview resources.

General

Blogs & Resources

Lectures & Teaching Material

Since summer of 2013, the course is taught from the book Algorithm Design: Parallel and Sequential.

Quizzes and homeworks are not available publicly after 2008 Fall.

YouTube

Solutions by Coding Platforms

Solutions by Topics

Practice

Other Curated Lists

Asymptotic Analysis

Bits

Math & Numerics

Arrays

Sorting & Searching

Combinatorial

Divide-and-Conquer

Linked Lists

Hashing

Strings

Recursion

Trees

Graphs

General

Traversal

Range Queries

Connectivity

Matroids

Circuits

Circuits

Shortest Paths

Network Flow

Dynamic Programming

NP-Complete

Lectures 24 through 27 are all related to TSP.

Heuristic Approaches to Solve TSP - Abid+Iqbal

Task Scheduling

2018-06-08T00:00:00+00:00

Introduction

I recently came across this blog post where the author talks about a solution for scheduling interdependent tasks. Coincidentally, I have been brushing up Data Structures and Algorithms in preparation for job interviews and realized that the solution discussed in the aforementioned blog could be simplified. That is what I am going to be discussing in this post.

Problem Statement

Given a set of tasks that have dependencies on one another, find an execution order such that depended tasks are executed before their dependents. This is a general class of problems known as the Scheduling Problems that honors a set of constraints, most importantly, the precedence constraints, which specify that certain tasks must be performed before certain others. For example, consider a college student planning a course schedule, under the constraint that certain courses are prerequisite for certain other courses. In the image below, an arrow from course A to course B means A needs to be taken before B.

This problem can be solved by sorting the directed graph in the Topological order such that all its edges point from a vertex earlier in the order to a vertex later in the order. A topological order for our example model is shown above.

An obvious prerequisite is that no cycles exist in the graph. Our solution is going to check for cycles though (Trust, but verify).

Design

It occurred to me that a Topological sort is actually not necessary, because the dependencies of a task are fixed at its construction time. The original solution needed the sort because the task scheduler attempted to figure out which tasks needed to run before others; instead, we can simplify the design such that each task notifies its dependents on its completion. Think of it like a Domino effect; once started, the process would progress on its own. Referring to the previous example, Scientific Computing would notify Artificial Intelligence when it is completed, Complexity Theory would notify Cryptography, and so on and so forth.

That is where the Observer Pattern comes in.

Implementation

I make heavy use Java 8 Stream and concurrency constructs, so if you are not familiar with those, you may want to brush up on those before reading further.

Task

Java supports the Observer Pattern in two ways:

Using Observer and Observable.
Using PropertyChangeSupport and PropertyChangeListener.

I chose the latter because the former has been deprecated in Java 9; see the Observable Javadoc for details.

All we need to do in a task is keep track of the completion of its dependencies. When all the tasks it depends on have completed, the task starts execution, and on completion, notifies its dependents. Here’s the Task class snippet:

public final class Task implements PropertyChangeListener {
    private final String id;
    private final PropertyChangeSupport support;
    private final Collection> dependsOn;
    private final Function, V> action;
    private final Executor executor;
    private final Map resultMap;

    private Task(...) {
        this.id = id;
        this.dependsOn = dependsOn;
        this.action = action;
        this.executor = executor;

        this.support = new PropertyChangeSupport(this);
        this.resultMap = new ConcurrentHashMap<>();
        this.dependsOn.forEach(d -> d.addDependent(this));
    }

    public static  TaskBuilder builder() {
        return new TaskBuilder<>();
    }

    @SuppressWarnings("unchecked")
    void addDependent(PropertyChangeListener dependent) {
        support.addPropertyChangeListener(dependent);
        if (dependent instanceof Task) {
            Collection> dependsOn = ((Task) dependent).dependsOn;
            if (!dependsOn.contains(this)) {
                dependsOn.add(this);
            }
        }
    }

    ...

    public V result() {
        return resultMap.get(id);
    }

    @Override
    @SuppressWarnings("unchecked")
    public void propertyChange(PropertyChangeEvent evt) {
        String taskId = evt.getPropertyName();
        Set ids = dependsOn
                .stream()
                .map(task -> task.id)
                .collect(toSet());

        if (!ids.contains(taskId)) {
            throw new IllegalStateException(...);
        }
        resultMap.put(taskId, (V) evt.getNewValue());
        if (ids.equals(resultMap.keySet())) {
            execute();
        }
    }

    public void execute() {
        try {
            CompletableFuture.supplyAsync(
                      () -> action.apply(unmodifiableMap(resultMap)), executor)
                    .thenAccept(result -> {
                        resultMap.put(id, result);
                        support.firePropertyChange(id, null, result);
                    });
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    ...

    public static final class TaskBuilder {
        ...

        public Task build() {
            ...
            return new Task<>(id, dependsOn, action, executor);
        }
    }
}

I use the Builder Pattern to provide an elegant means of constructing a task.

The task scheduler simply needs to check if there is a cycle, and if not, submit the tasks that do not depend on others. In order to do that, I use the excellent algs4 library.

public final class TaskScheduler {
    private final Map resultMap;
    private final CountDownLatch latch;
    private final List> rootTasks;
    ...

    @SuppressWarnings("unchecked")
    public TaskScheduler(List> tasks) {
        int v = tasks.size();
        Map taskMap = IntStream.range(0, v)
                .mapToObj(i -> new SimpleImmutableEntry<>(tasks.get(i).id(), i))
                .collect(toMap(Map.Entry::getKey, Map.Entry::getValue));

        Digraph graph = new Digraph(v);
        taskMap.forEach((name, idx) -> {
            Task task = tasks.get(idx);
            task.dependsOn()
                    .forEach(dependency ->
                        graph.addEdge(idx, taskMap.get(dependency)));
        });
        DirectedCycle directedCycle = new DirectedCycle(graph);
        if (directedCycle.hasCycle()) {
            ...
            throw new IllegalArgumentException(...);
        }

        rootTasks = IntStream.range(0, v)
                .filter(i -> graph.outdegree(i) == 0)
                .mapToObj(tasks::get)
                .collect(toList());

        resultMap = new ConcurrentHashMap<>();
        latch = new CountDownLatch(v);
        PropertyChangeListener accumulator = evt -> {
            latch.countDown();
            resultMap.put(evt.getPropertyName(), (V) evt.getNewValue());
        };
        tasks.forEach(task -> task.addDependent(accumulator));
    }

    public Map await(long timeout, TimeUnit unit) {
        rootTasks.forEach(Task::execute);
        latch.await(timeout, unit);

        return unmodifiableMap(resultMap);
    }
}

Lastly, since we should never write code without writing unit tests (you do not, right?), here is a test:

void testTask() throws InterruptedException {
    Random random = new Random();
    ExecutorService executor = Executors.newSingleThreadExecutor();

    Task f1 = Task.builder()
            .id("f1")
            .executor(executor)
            .action(resultMap -> {
                try {
                    Thread.sleep(random.nextInt(5));
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                return 0;
            })
            .build();
    Task f2 = Task.builder()
            .id("f2")
            .executor(executor)
            .action(resultMap -> {
                try {
                    Thread.sleep(random.nextInt(5));
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                return 1;
            })
            .build();
    Task f3 = Task.builder()
            .id("f3")
            .executor(executor)
            .dependsOn(Arrays.asList(f1, f2))
            .action(resultMap -> {
                try {
                    Thread.sleep(random.nextInt(5));
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                return resultMap.get("f1") + resultMap.get("f2");
            })
            .build();
    Task f4 = Task.builder()
            .id("f4")
            .executor(executor)
            .dependsOn(Arrays.asList(f2, f3))
            .action(resultMap -> {
                try {
                    Thread.sleep(random.nextInt(5));
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                return resultMap.get("f2") + resultMap.get("f3");
            })
            .build();

    Integer result = new TaskScheduler<>(f1, f2, f3, f4)
            .await(30, TimeUnit.SECONDS)
            .get("f4");

    assertNotNull(result);
    assertEquals(2, result.intValue());
}

To finish off, here is a sample execution log:

Task: f1 depends on: []
Task: f2 depends on: []
Task: f3 depends on: [f1, f2]
Task: f4 depends on: [f2, f3]
Task: f1 is executing on thread: pool-1-thread-1
Received notification from task: f1 inside task: f3
Task: f3 is not ready
Task: f2 is executing on thread: pool-1-thread-1
Received notification from task: f2 inside task: f3
Task: f3 is ready to be executed
Received notification from task: f2 inside task: f4
Task: f4 is not ready
Task: f3 is executing on thread: pool-1-thread-1
Received notification from task: f3 inside task: f4
Task: f4 is ready to be executed
Task: f4 is executing on thread: pool-1-thread-1

Conclusion

I believe a good solution is deceptively simple, and therein lies its elegance. For a production-grade solution, I would have robust exception handling, but for a proof-of-concept like this, I did not spend much time with that.

Couchbase on Kubernetes

2017-10-24T00:00:00+00:00

Introduction

With the paradigm shift towards Microservices and Cloud-Native architecture (whether everyone is doing it right is another topic though), containers have become almost synonymous with those terms. Given that Kubernetes is a “Production-Grade Container Orchestration” (their words), and with features like horizontal scaling, self-healing, and storage orchestration, it is only natural that we are discussing running a database as a container on Kubernetes. In this post, I discuss running a Couchbase cluster on Kubernetes. I assume familiarity with both Couchbase and Kubernetes, as well as RxJava and Hystrix, all of which are required for the rest of this write-up.

From now on, I will refer to the Couchbase server as CB, the Couchbase client application as the client, and Kubernetes as K8S.

This is not a meant to be a recipe for running a highly-available, fully-redundant, multi-node CB cluster on K8S in Production. As of this writing, CB does not officially support running on K8S. This is a narrative of my experiences and learnings; YMMV and most likely will.

The term connection is used loosely here to indicate a logical connection, and is not to be taken as a physical TCP/IP connection.

Problem Statement

We want to run a CB cluster on K8S, and must support the following use cases:

The client startup must be resilient of CB failure/availability.
The client must not fail the request, but return a degraded response instead, if CB is not available.
The client must reconnect should a CB failover happens.
CB server must be able restart without human intervention; while this may sound trivial, I will later discuss why it is not.

We will not discuss the following:

Multi-node CB cluster.
Failover.

Architecture

Now that we have laid out the basics, let’s analyze each item of the problem statement in detail.

Client Startup

Our client is a Spring Boot app, which used to use Spring Data Couchbase for CB integration. Spring Data Couchbase initializes various CB related beans at startup, and a failure to connect to CB is fatal. I tried to work my way around that limitation, but soon realized it wouldn’t work; I needed to roll my own CB client code. At this point, let me ask you this:

What do you think is most needed to handle CB connection failure at startup?

The answer is flippantly simple: Do not initialize CB connection at startup. When do we do it then? Umm, perhaps on the first request? That could work but the CB cluster and bucket opening are time-consuming operations, so unless we could tell the client “hey, you are the lucky one to make the first request; please wait while we get our s**t together”, we needed to find another way. It is the right idea, but we need a better implementation.

We want to decouple the CB initialization from the request thread, and if at any time during the request we find CB connection not initialized, we start the initialization process on a separate thread while immediately failing the CB request. We also want to attempt initialization during the application startup, on a separate thread of course, which if successful, means that the first request will find a CB connection ready to be used. However, if the initialization fails during the startup, we do not want hundreds or thousands of subsequent requests to flood the CB server; we need to throttle the traffic, as well as put a sleep time between failures, should there be any. We need Hystrix.

Good thing is that the CB Java client SDK fully supports RxJava, and so does Hystrix, so they fit like peas in a pod.

Last but not the least, since cluster and bucket opening are expensive operations, we cache the values once successful.
Degraded Response

I already touched upon this in the previous section. If at any time during the request we find CB connection not initialized, we immediately fail the CB request. The client handles the exception, and returns a successful response without CB data. If, however, CB connection had been initialized but later dropped, then the CB request blocks until it times out. As long as the CB request timeout is less than the HTTP request timeout, the caller receives a slightly delayed, but successful response.
Client Reconnection

Connection attempts if none had succeeded so far has been discussed in the startup section. Continuing on the previous section, reconnection after a successful connection had been established but later dropped is handled by CB SDK. This is also related to the following section, because when a CB node is restarted, it’s IP may very well change. The default CB client (and server) behavior is to use IPs for nodes, but in K8S, that does not work because of the aforementioned reason. We need to use DNS names that do not change with node restarts. We will see later how to implement this in K8S.
CB Server Node Restart

Discussed in the previous section.

Implementation

CB Client

As previously mentioned, we use RxJava all the way. It makes a crucial difference between this approach and any other by introducing delayed execution. I dare say, as of this writing, no other solution exists publicly that employs this technique, and thus, is robust enough to handle CB connection failure at startup. My code uses a Single and Single, so we can delay the execution until subscription. I use the factory pattern to encapsulate the gory details of bootstrap and bucket opening. Using RxJava also allows me to declaratively spawn new threads, and handle failures gracefully.

“Talk is cheap. Show me the code” - Linus Torvalds.

The lynchpin of this solution are three classes, CouchbaseAsyncClusterFactory, CouchbaseAsyncBucketFactory, and AsyncBucketHystrixObservableCommand. Technically, the first two are interface, with default implementations provided in the same Java files. The factory classes are singleton Spring beans that each store references to a Single and Single, respective to their names. The Hystrix command is responsible for opening the bucket, optionally creating it if does not already exist as well as creating a primary index, and controlling the access to the bucket creation/opening logic through a semaphore. I strongly encourage you to take a look before proceeding: The code speaks for itself, I hope, and there are ample comments in the Hystrix command.

I also called upon the Repository design pattern, and created a CouchbaseRepository interface, and a BaseCouchbaseRepository abstract class extending from it. Client code is usually expected to extend BaseCouchbaseRepository, and simply supply the generic type required. Of course, ambitious clients are free to implement CouchbaseRepository, or even use the factory classes directly. A sample Repository implementation is as follows, and it’s beyond trivial.

@Repository
public class CouchbaseBeerRepository extends BaseCouchbaseRepository {
}

That’s it! The code using the CouchbaseBeerRepository looks like the following:

beerRepository.findOne(id)
    .map(ResponseEntity::ok)
    .onErrorReturn(t ->  ResponseEntity.status(INTERNAL_SERVER_ERROR).build())
    .timeout(TIMEOUT_MILLIS, MILLISECONDS)
    .toBlocking()
    .value()

The complete CB client library code is on my GitHub, as well the client app that uses it.

It appears that with CB server 5, the client holds on to the previously established but now invalid connections longer. To counter this, we set the “Socket Keepalive ErrorThreshold” = 1 for the CB client.

If you are using Spring 5 WebFlux, you can go completely non-blocking and return a Reactive Streams Publisher. That is a topic for another day.

Couchbase Java client SDK has a CouchbaseAsyncRepository, but since it requires an AsyncBucket for instantiation, it was not useful to me.

CB Server

Official Couchbase Docker image requires manual set up, thus I created my own image that initializes an one-node cluster out of the box. It is available on Docker Hub as asarkar/couchbase. In order to make it work on K8S, we must set the node hostname to a DNS name, not IP. By doing so, when the node is restarted, the hostname does not change even though the IP may. To achieve this, we use a StatefulSet, along with a Headless service. StatefulSet gives us predetermined Pod names, and when used with a Headless service, each node gets a network identity in the form of $(statefulset name)-$(ordinal).$(service name).$(namespace).svc.cluster.local, which is what we use for hostname. The service itself gets a DNS name $(service name).$(namespace).svc.cluster.local that resolves to a list of all the nodes. Refer to the StatefulSet and Headless service docs, as well as the K8S DNS docs for further details.

However, the above is not all. If we stop here, the CB client would be given the service DNS name, which in turn would resolve to the Pods during bootstrap. That does not work for two reasons:

For high availability, the CB client SDK usually expects a list of nodes, not a single node.
The “smart” client, as they call it, establishes a fully-connected mesh network with the CB server nodes. Depending on what services are running on which nodes, and how the data is replicated, the client makes the decision which node to talk to. Putting a service in front of the nodes completely breaks this process.

Luckily, there is something called DNS SRV Record. Quoting K8S docs:

For each named port, the SRV record would have the form _my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster.local … For a headless service, this resolves to multiple answers, one for each pod that is backing the service

And from CB docs for Managing Connections, we come to know that the CB client can be configured to bootstrap with a DNS SRV Record in the form of _couchbase._tcp.example.com. Connecting the dots, if we name one of the exposed ports on the service couchbase, and provide the client with a name cb-svc.my-namespace.svc.cluster.local, we are golden!

I initially assumed that the prefix couchbase was only an example, and could be any string as long as the FQN is a DNS SRV name. That is not the case; it is not at all difficult to make it configurable in the CouchbaseEnvironment, but the CB guys decided to hard code the couchbase prefix instead.

If the CB client is running in the same K8S namespace, only cb-svc can be used without requiring the FQN.

Last but not the least, data persistence. After all, what good is a database that cannot persist data? Luckily, StatefulSet has first-class support for Stable Storage. Since we are running a single-node CB cluster on a dedicated K8S node, we chose to go with a hostPath. We did try GlusterFS once, but it did not perform well under load, and we did not see the return on investment in fine-tuning it. In the future, if we loosen the restriction to run CB on a single K8S node, we can easily repopulate the data in a short time. For the period the data would not be available, the client would continue to return a degraded response.

My GitHub CB project contains the K8S manifests. You can use those to run locally on Minikube, or on any other cluster.

There is a gotcha is with configuring the K8S Liveliness and Readiness Probes. We implemented the former as a simple probe of the 8091 port (here is a list of all CB ports). Implementing a smart probe for readiness, like checking the cluster status or something similar, ran into a problem where CB tries to contact all the nodes for determining cluster status, and until the readiness probe succeeds, K8S does not create a DNS entry for the node, thus resulting in a catch-22 situation. Thus, we did not implement a custom readiness probe.

Conclusion

Like I mentioned in the beginning, CB does not officially support running on K8S. They say they are working on it, but not much details have been made available. There also exists an official blog, but it falls short of addressing the issues discussed in this article. In order to be a first-class K8S citizen, CB has to support effortless scaling up and down, which means adding and removing nodes without the need for manual intervention, and step up their failover game. Data replication/migration when nodes are added or removed also needs to be handled transparent to the clients. While time will tell the future of CB server on K8S, I do not see why, with some effort, the client solution here cannot be incorporated in the CB Java client SDK; I intend to approach them with that proposal, such that other people can also benefit from my effort and learning.

Spring Cloud Config Server - The Hidden Manual

2017-02-09T00:00:00+00:00

Introduction

In 2015-2016, we redesigned a monolithic application into Microservices and chose the Spring Cloud Config for configuration management.

Quoting the docs:

Spring Cloud Config provides server and client-side support for externalized configuration in a distributed system.

To their credit, the Spring Cloud Config documentation is fairly good, but it doesn’t go into the level of detail I was looking for. This post attempts to bridge that gap. I assume some basic familiarity with Spring Cloud Config, so if you are new to it, come back after having read the official documentation. If working with Spring Cloud, you may also find my post Spring Cloud Netflix Eureka - The Hidden Manual useful.

Basic Architecture

The Config server is a Spring Boot application that is aware of the Spring PropertySource and Environment abstractions. It serves properties in response to HTTP requests. It can also serve plain text files, thus acting like a simple Web Server. I’m not going to repeat what’s already specified in the official docs, so let’s move on to what’s not mentioned there.

When reading config files from sub-directories, if more than one directory has files with identical names, the last one wins. I haven’t checked but would assume similar behavior for multiple Git repos as well. The question I’ve, and didn’t get time to look into, is what makes a PropertySource for config server? Is it an application, a Spring profile, a Git repo, a label, or some combination of these?

Failover

In spite of the built-in intelligence, Config server is pretty dumb in some cases. It doesn’t have any caching mechanism, although properties are fairly easy to cache, and as of this writing, it doesn’t have first class support for failover, as well. Depending on the backing property store, it either goes to the file system for every request, or pulls from Git. The former is not that bad, even though caching would obviously help, but the latter could be an issue if connection to Git is slow or the repo is somewhat large. Ticket #566 complains about the last point.

Lack of failover is a bigger concern. Assuming the backing property store is a Git repo, if Config loses connectivity to Git, or for some other reason, fails to clone the repo, it fails the request. This is a problem for Singleton beans that attempt to fetch properties at startup, because the application fails to start up. There are few tickets opened to address this, notably #617 and #631. While Spring works towards providing first class support for failover, I submitted a pull request to make JGitEnvironmentRepository#refresh method public, which is the one that does the cloning. If refresh is made public, then Config server could be enhanced using Spring AOP to handle failover. How, depends on your use case and design. One option is to return stale properties from the local repo if refresh fails. Another option is to return cached properties. Multiple Git repos could be configured as well, although in that case, keeping them in sync would become another challenge.

High Availability

Unlike Eureka, Config server doesn’t have a concept of peers. Thus, if you are serving properties from native file system, the simplest option is to use a shared file system. You can them employ the usual techniques against hard disk failures to provide high availability. If you are using Git, the simplest solution is to put Config server behind a load balancer, or if using a container orchestration solution like Kubernetes, set up a Kubernetes service. The actual number of Config server instances is then abstracted from the clients who only see the load balancer (physical or Kubernetes) URL.

If you expect that the config server may be temporarily unavailable when your client app starts, you can ask it to keep trying after a failure. First you need to set spring.cloud.config.failFast=true, and then you need to add spring-retry and spring-boot-starter-aop to your classpath. See the official docs for details. As of this writing, there is no server-side retry, although it wouldn’t be very difficult to add that using AOP or by submitting a pull request.

Dynamic Update

It’s possible to dynamically update the properties without having to restart Config server or the client apps using Git webhooks and Spring Cloud Bus. It requires a broker as the transport, and as of this writing, only RabbitMQ and Kafka are supported. You need to add spring-cloud-config-monitor to the Config server, and a broker implementation like spring-cloud-starter-bus-kafka to both the server and the client. Then you need to set up Git webhooks for the /monitor endpoint, that’s brought in by the spring-cloud-config-monitor. I’ve tried this implementation using Kafka, and it worked, but in our case, deploying and maintaining a queue for dynamic updates that’s going to be quite infrequent couldn’t be justified. Unfortunately, the monitor endpoint is tightly coupled with Spring Cloud Bus and it’s not possible to reuse the controller and the parsing of the events, but to choose how to react to the push notifications. I’ve opened ticket #628 to address this, although I don’t expect them to fix it anytime soon.

I ended up copying the monitor endpoint implementation from Spring. Then I pimped the Config server to fetch the list of all pods from Kubernetes, and send a HTTP POST to each one on the /refresh endpoint provided by Spring Boot Actuator. I used the KubernetesClient from the io.fabric8:kubernetes-client project for retrieving the list of pods. The actual code handles some advanced cases like rolling update and retries, but the theory behind it is as explained above and not complicated. The technique I used is similar to this example, and yes, the solution is fully reactive within the limits of current Spring framework.

Config server is not the only choice though: Netflix Archaius, Apache ZooKeeper and Kubernetes ConfigMap also provide configuration management, although none is aware of Spring PropertySource and Environment abstractions out of the box. In the end, Config server is a good choice if you are willing to put in some time to make it more robust.