Kubernetes CPU limits - When JVM sees more than it should

#java Mar 12, 2026 9 min Mike Kowalski

To limit, or not to limit - this is the question that Java developers ask themselves when configuring CPU resources for their Kubernetes pods. Avoiding a strict pod CPU limit can be beneficial for latency-sensitive workloads, as it prevents pauses caused by CPU throttling. At the same time, running without a CPU limit removes an important signal the JVM uses to size its internal thread pools.

When there’s no CPU limit set, even modern Java versions will “see” all CPUs available on the Kubernetes node. In these cases, it’s often beneficial to explicitly override the ActiveProcessorCount to prevent JVM and your frameworks/libraries from using too many threads and increase performance predictability.

In this article, we will look at JVM CPU detection when running in Kubernetes, and discuss its implications for real-life applications.

Disclaimer

All the scenarios described in the article have been tested using the latest Java 25 Amazon Corretto build. Older, but still modern releases of Java (17+) should behave more or less the same.

Different Kubernetes pod configurations have been tested against the local k8s cluster (version 1.34) provisioned by Docker Desktop, running locally on my Apple Silicon MacBook Pro (2021).

While number of CPUs, CPU cores, and CPU vcores are completely different things, I will use the term ‘number of CPUs’ when referring to what Java detects and what’s visible in /proc/cpuinfo on Linux systems, just to make things simpler.

When using the term number of threads, I’d be referring to the traditional threads, not the virtual ones.

Kubernetes resource limits

Let’s start with a recap of how Kubernetes resource limits work. If you’re not familiar with the subject, I’d suggest starting with the official documentation - Resource Management for Pods and Containers. Keep in mind, that in this article, we’re interested in CPU, not the memory.

Kubernetes exposes two CPU-related configuration “knobs”: requests and limits. The requests define the minimum CPU resources required for scheduling, while the limits defines the maximum CPU time the container is allowed to consume. Both values are measured in so-called cpu units. According to the official k8s documentation:

In Kubernetes, CPU unit is equivalent to 1 physical CPU core, or 1 virtual core, depending on whether the node is a physical host or a virtual machine running inside a physical machine.

Pod CPU limits are enforced via throttling. When a container exhausts its CPU quota for a given period, its runnable threads are paused until the next scheduling window. From the application’s perspective, this can look like a stop-the-world pause, even though it’s enforced by the Linux scheduler rather than the JVM.

Because throttling can significantly hurt tail latency, some developers prefer to avoid defining CPU limits. While this can improve latency under load, it also removes an upper bound on CPU usage from the JVM’s point of view.

Java container awareness

The JVM has been container-aware since Java 11 (with partial backports to Java 8). When running inside a container, Java relies on Linux control groups (cgroups) to detect CPU and memory constraints. Then, the JVM uses these values to size internal resources accordingly.

Historically, most container runtimes used cgroups v1, where CPU limits were expressed using a time-based model: a CPU quota and a corresponding scheduling period. When a CPU limit was defined, the JVM derived the number of available processors by dividing the quota by the period. While this approach is still used today when a hard limit is present, early implementations suffered from multiple issues, including rounding errors, inconsistent updates, and differences between container runtimes.

Modern Linux distributions and most recent Kubernetes setups rely on cgroups v2, which introduced a unified hierarchy and a different CPU controller model. In cgroups v2, CPU limits are exposed via the cpu.max setting, while CPU shares are controlled separately using cpu.weight (see CPU bandwidth and CPU weight). When cpu.max is set, the JVM can easily derive a concrete processor count and size its internal thread pools accordingly.

However, when no CPU limit is defined, cgroups v2 explicitly reports unlimited CPU (cpu.max = max). While CPU weight influences how CPU time is distributed under contention, it can’t be translated into a definitive number of processors. For this reason, the JVM ignores the CPU weight when computing the number of available processors.

CPU requests are not visible to the JVM, regardless of whether cgroups v1 or v2 are used. When no CPU limit is defined, Java assumes it can use all CPUs available on the host node. While this behavior is intentional and consistent across modern Java versions, it also has important implications.

The detected number of available processors, exposed via Runtime.getRuntime().availableProcessors(), is used to size several key JVM components, including:

  • JIT compiler threads,
  • Garbage Collector (GC) worker threads,
  • the common ForkJoinPool.

Many frameworks and libraries rely on exactly the same value when configuring their own thread pools.

Available processors vs JVM threads

The number of threads used by the JVM has a direct impact on application performance. Too few threads may underutilize available CPU resources and miss opportunities for work parallelization. Too many threads increase context switching, synchronization overhead, and GC pressure. These effects will be especially visible in latency-sensitive and high-throughput systems.

When a pod runs without a CPU limit, the number of threads created by the JVM may vary significantly depending on the size of the Kubernetes node it is scheduled on. This can lead to unpredictable behavior changes in production.

Let’s say that our application requires at least 3 CPUs to run (requests.cpu: 3, no limit). It could be scheduled on a 4 CPU k8s node, as well as on the 32 CPUs one. When running on the larger node, it would use 12 more JIT compiler threads, 24 more G1 GC threads, and have a ForkJoinPool with parallelism greater by 28 threads! That’s 64 threads more just for these 3 aspects of the JVM operations, not taking the frameworks and libraries into account.

Details on how the JVM sizes these resources (as well as where the above numbers come from) can be found in the appendix at the end of the article. You’ll also find a small bonus there, with yet another important aspect of the JVM, which is influenced by the number of detected processors.

Setting ActiveProcessorCount for predictability

Running a containerized Java application with no pod CPU limit set leads to a lack of predictability. Depending on the k8s node that the pod would be scheduled to, the internal JVM thread pools mentioned before might be smaller or bigger, influencing performance and throughput.

Luckily, the JVM allows us to override the number of detected CPUs with -XX:ActiveProcessorCount. For example, with -XX:ActiveProcessorCount=2, the app will always “see” only 2 CPUs, no matter how beefy the underlying k8s node it is running on actually is. This gives us an easy way of increasing the predictability when there’s no pod CPU limit defined.

What value should be picked for the ActiveProcessorCount when no CPU pod limit is set? Unfortunately, there is no one-size-fits-all answer. A lot depends on how our app uses available CPU. Is it CPU computation-intensive? Does it perform a lot of blocking IO operations? Does it use available threads efficiently? Does it have to deal with a high, interactive load? What’s the priority - latency or throughput?

The existing requests.cpu value could be a good starting point - assuming it’s set and based on real-life measurements. However, I’d probably avoid doing a single-step, radical change. If the app is already running with no limits.cpu set, this effectively translates to -XX:ActiveProcessorCount=nCPUs where nCPUs is the total number of CPUs available on the underlying Kubernetes node.

For performance-critical systems, I strongly recommend gradual tuning combined with load testing. Taking our example application, going from 32 detected CPUs to just 3 (requests value) would be a rather radical change. Instead of improved latency we might observe the complete opposite, together with increased GC pressure due to lower parallelism.

What metrics to observe when adjusting the ActiveProcessorCount value? My suggestion is to look at:

  • tail latency of your critical path(s),
  • GC time/pressure,
  • JVM threads usage.

If you’re using Micrometer, some of these metrics should be already there.

Once you determine and set ActiveProcessorCount that fits your application’s needs, you can feel much safer about the performance predictability of your load. Yet, there are still other aspects that might affect it, like the type of the node that Kubernetes uses (e.g. x86 vs ARM). Yet, that’s a story for a separate post.

Summary

Defining CPU limits for Java applications running in Kubernetes is a trade-off between predictability and latency. While CPU limits provide stable JVM sizing, they may lead to throttling that negatively affects tail latency in performance-sensitive workloads.

When no CPU limit is set, the JVM has no reliable way of estimating the actual number of available CPUs. Even modern Java versions will “see” all CPUs available on the underlying k8s node and size their internal thread pools accordingly. On large nodes with many CPUs available, this can lead to using too many threads and unpredictable performance.

Explicitly setting -XX:ActiveProcessorCount can help stabilize JVM behavior when no pod CPU limit is set. Picking the right value will usually require careful tuning and observations. Too radical changes may degrade performance and increase GC pressure rather than improve latency. CPU configuration should be approached iteratively, guided by measurements such as tail latency, GC activity, and CPU utilization. The optimal setup depends on many aspects including workload characteristics and latency vs throughput priorities.

Appendix: How JVM sizes its resources

JIT compiler threads

Java bytecode takes a long journey from getting interpreted to getting compiled by the JVM. The JIT (Just-In-Time) compiler, can compile frequently used parts of the code into the native code at runtime, improving the overall performance.

There are two configuration flags related to the so-called CI compiler threads, that we will look at. CICompilerCountPerCPU (turned on by default) enables the dependency on the number of detected CPUs. CICompilerCount gives us the exact value calculated by the JVM (or allows to override this value manually). The formula used by the JVM is based on a logarithm of the number of CPUs.

Running the following command:

java -XX:+PrintFlagsFinal -version 2>&1 | grep CICompilerCount

produced the following values:

  • CICompilerCount=2 for 1 available CPU,
  • CICompilerCount=2 for 2 available CPUs,
  • CICompilerCount=3 for 4 available CPUs,
  • CICompilerCount=4 for 8 available CPUs,
  • CICompilerCount=15 for 32 available CPUs.

GC threads (G1 GC)

G1 GC is probably the most popular GC algorithm used these days. Similarly to other GCs, it can utilize parallelism to speed up its operations. There are two main configuration flags to take a look here.

The ParallelGCThreads indicates the stop-the-world parallel worker count used for evacuation, remark, mixed collections, etc. It is equal to the number of available CPUs when there are 8 or fewer. For nodes with more than 8 CPUs available, it’s calculated as floor(5/8 * availableCPUs).

The second configuration flag is ConcGCThreads, indicating the number of threads used for concurrent marking. It is related to the previous one and calculated as max(1, ParallelGCThreads / 4).

Running:

java -XX:+UseG1GC -XX:+PrintFlagsFinal -version 2>&1 | grep -e ParallelGCThreads -e ConcGCThreads

produced the following values:

  • ParallelGCThreads=1, ConcGCThreads=1 for 1 CPUs,
  • ParallelGCThreads=2, ConcGCThreads=1 for 2 CPUs,
  • ParallelGCThreads=4, ConcGCThreads=1 for 4 CPUs,
  • ParallelGCThreads=8, ConcGCThreads=2 for 8 CPUs,
  • ParallelGCThreads=23, ConcGCThreads=6 for 32 CPUs.

ForkJoinPool parallelism level

Java’s ForkJoinPool is a special ExecutorService, using work-stealing algorithm for workload distribution. Most of its usages (including parallel Stream operations) rely on the static commonPool, with parallelism set to max(availableCPUs - 1, 1). This parallelism level defines how many threads can execute submitted tasks in parallel, aiming to ensure efficient processing.

Depending on the number of CPUs detected by the JVM, the parallelism level of the commonPool would be:

  • parallelism=1 for 1 CPU,
  • parallelism=1 for 2 CPUs,
  • parallelism=3 for 4 CPUs,
  • parallelism=7 for 8 CPUs,
  • parallelism=31 for 32 CPUs.

Bonus: Virtual Threads ForkJoinPool parallelism level

As you may know, Java Virtual Threads are scheduled on a dedicated pool of platform threads. This pool is in fact a specialized ForkJoinPool, with a default parallelism equal to the number of detected available CPUs. This behavior can be changed by setting the jdk.virtualThreadScheduler.parallelism system property.

That’s yet another reason why the JVM’s view of available CPUs matters to our Java applications!

Mike Kowalski

Software engineer believing in craftsmanship and the power of fresh espresso. Writing in & about Java, distributed systems, and beyond. Mikes his own opinions and bytes.