When the Cloud Gets Expensive: Lessons Learned with Kubernetes and Java

When the Cloud Gets Expensive: Lessons Learned with Kubernetes and Java

Why traditional Java architectures often reach their limits and how modern approaches with Rust, Go, and native binaries offer cost advantages.

Cloud Illusions

Many companies, especially in IT, are increasingly focused on sustainability and closely monitor costs, often negotiating down to four decimal places or more. Today, Kubernetes and / or the major hyperscalers are mainly used as operational platforms, having established themselves as standard choices for running systems.

Kubernetes was originally designed to run "stateless workloads" and scale them according to demand. Over time, other workloads, such as "stateful workloads", were added to allow running databases on Kubernetes as well. This means that, depending on the load, services must be started and stopped quickly for horizontal scaling. Over the past 10 years, a comprehensive ecosystem has developed around Kubernetes, see Cloud Native Landscape.

A key promise of the hyperscalers was that running services in the cloud would be cheaper: companies could save on personnel, as no longer managing and maintaining their own servers — the hyperscaler would take care of these tasks. Additionally, firms would only pay for the resources they actually used (“pay per use”).

However, a certain reality check has set in. Some companies are moving their operations back to their own data centers — the most prominent example is 37signals. The sometimes significantly higher costs are not solely the responsibility of the hyperscalers but also of the companies themselves. The choice of technology stack plays a major role here. For new projects, companies often reflexively go with Java using Spring Boot and Kafka, citing reasons such as: "Java developers are everywhere," "Java developers are cheap," or "We decided to be a Java company," etc.

A Bit of History

Java (1991–1995) emerged long before Kubernetes became widespread and solved the problems of its time: covering many different processor architectures with a single programming language via a virtual machine (Write Once, Run Anywhere). Although Java is statically typed, its design is highly dynamic, which led to ecosystems like the Spring Framework, where applications are assembled at runtime. Some companies not longer look for Java developers but directly for Spring Boot developers.

The diversity of processors declined rapidly in the mid-90s, Intel dominating the market with its x86 architecture — a state that lasted for many years. Only with the release of Apple Silicon did a new processor architecture (ARM) gain wider adoption. Major hyperscalers also developed their own ARM processors, used only in their own data centers. The main reason for this is the enormous efficiency of the ARM architecture and the associated cost savings.

What Does This Mean in the Long Term?

Java, due to its architecture with a virtual machine, caches, and garbage collector, is very resource-hungry. This applies not only to memory usage but also to the size of Docker images. The JVM itself ranges from 80MB to 120MB, depending on whether a Docker optimized JVM is used. A Java application usually requires a similar amount of space. Additionally, a minimal Linux or distroless image is still needed. Thus, a Docker image typically starts at around 200MB.

With continuous integration (CI), large amounts of data accumulate quickly, even if only the release images are stored in a Docker registry. Care must be taken when cleaning up images: if images used by rarely updated services are deleted, those services may fail to start on a node without the local image — especially if the required image is no longer available in the repository. This depends on the deployment mechanism in use.

As mentioned earlier, a Spring Boot application is assembled at runtime, leading to long startup times. During this startup time, the service may not be fully available, potentially affecting overall availability. Frameworks like Spring Native and Quarkus were developed to address these challenges.

These frameworks are optimized to run on GraalVM in an ahead-of-time (AOT) compilation. They aim to optimize Java for the changed requirements brought by Kubernetes. Attempts to use Spring Native for services in a Java based Kafka-Streaming-Platform proved impractical. The main issue was that not all classes required at runtime could be captured, and manual configuration was too complex, so resource usage of the services could not be reduced. Kafka (implemented in Java) also requires significant resources.

Every component in a Docker image can potentially be attacked and thus represents a risk. The dynamic behavior of Java can also be critical — a well-known example is the Log4Shell vulnerability in Log4j.

Only Java?

Much of what is observed in practice does not only apply to Java. Issues such as just-in-time compilation by the VM and Docker image sizes also occur in other technologies.

Time for a Change

Companies using Kubernetes as an operational platform should also adopt technologies specifically designed for its requirements, which are better suited than older solutions developed long before Kubernetes.

Requirements:

  • Fast startup time
  • Low resource consumption
  • Scalability

At the forefront are programming languages like Rust and Go, known for their efficiency and low resource usage, compiling to native binaries. By targeting different processor architectures during compilation, the principle of "Write Once, Run Anywhere" still applies. Ahead-of-time compilation ensures that the application will start in the cluster and cannot be modified at runtime. Compilation itself acts as an initial code review. Experience shows that where Java services were only deployed but not under load, a Rust service was already running and performing its tasks.

Docker images are also very small, as only the binaries (around 5Mb-20Mb) are copied into the image. No additional components such as VMs or a Linux distribution are needed. These binaries cannot change their behavior at runtime, minimizing the risk of attacks.

The use of Kafka should be carefully considered. It may be necessary when event sourcing and log compaction are truly required. In many other cases, nats is a good resource - efficient alternative, or Redpanda, which claims Kafka API compatibility. In one case, it was possible to demonstrate a 30% reduction in operating costs through downsized services and replacements implemented in Rust.

These experiences are also reflected in other blog articles, e.g., codecentric Für mehr Nachhaltigkeit – CO2 sparen mit Rust and in the talk Building Nanoservices With Rust at the "Rust in Paris Conference," at minute 22:23, where a reduction in operating costs from $15,000 per month to just over $100 per month is reported. This example clearly demonstrates the cost savings achievable with good software architecture and technology choices.

Time for a change


To view or add a comment, sign in

More articles by Stefan Lauer

Others also viewed

Explore content categories