Kong API Gateway Behind the Scenes: Overcoming Reliability Challenges

8 min readApr 30, 2023

Kong is a widely-used open-source API gateway that helps organizations manage and secure their APIs at scale. However, operating Kong at scale can present a number of reliability challenges. In this blog, I’ll discuss some of these challenges and how to addressed them.

The blog is based on my experience operating Kong 3.x version in declarative mode(also known as DBless mode) in Kubernetes.

In Kong’s DBless mode, the gateway configuration is stored outside of the Kong datastore. Instead, it is defined in a YAML file, which is then loaded into Kong using the /config Admin API endpoint.

When Kong is started in DBless mode, it reads the YAML file and applies the configuration in-memory. This means that any changes to the configuration can be made directly in the YAML file and then applied to Kong using the /config endpoint.

Three Major Categories of Kong’s Reliability Issues

In my experience, I’ve noticed that Kong’s reliability issues can be broadly classified into three major categories: timer system exhaustion, high P99 latency, and high memory usage by Kong workers. These challenges can have significant impacts on the overall performance, stability, and scalability of an API infrastructure.

DNS Errors caused by Kong time system exhaustion

Kong uses timers internally for various functions such as connection pooling, rate-limiting, and timeouts. In a high traffic scenario, Kong can quickly exhaust the system timers, resulting in DNS resolution errors. To mitigate this, Kong provides a configurable setting for increasing the timer resolution.

Consider a scenario where Kong is used as an API gateway for a microservices architecture. As the number of microservices grows, the number of DNS resolutions required to access the various services can increase, leading to more requests being made to the DNS server. This increased workload can cause Kong’s timer usage in DNS resolution to be exhausted, resulting in timeouts or degraded performance.

In Kubernetes, the ndots (number of dots) parameter is used to specify the number of dots in a domain name. This parameter is used in DNS resolution to determine how many dots to append to a partially specified domain name. If ndots is not set correctly, it can lead to DNS resolution issues, which in turn can add stress to Kong’s timer system for DNS resolution.

For example, if the ndots parameter is set too high in Kubernetes, it can cause Kong to send requests to the DNS server with incomplete domain names, resulting in additional DNS lookups and increased stress on Kong’s timer system. This can lead to degraded performance or even timeouts if the timer system is exhausted.

For more details on Kong’s timer system https://medium.com/@surenraju/tick-tock-woes-tackling-timer-troubles-in-kong-production-86e7e2094d13

High P99 Kong added latency

As the scale of a Kong deployment grows, there are several factors that can contribute to an increase in Kong’s P99 added latency. One of the main factors is the addition of more plugins and routing configurations. As the number of plugins and routes increase, Kong’s processing time per request can increase, leading to higher added latency.

Another factor that can contribute to increased added latency is the usage of plugins that perform heavy computation or external API calls. These plugins can add significant overhead to request processing, leading to slower response times and higher added latency.

In addition, frequent use of the admin APIs to fetch or update declarative configuration can also contribute to increased added latency when configuration is huge. These APIs can be resource-intensive and may impact Kong’s ability to process incoming requests in a timely manner.

High memory usage

Reading declarative configuration using Kong’s admin API too often can cause memory leaks in Kong 3.0 version. This issue can be resolved when the check_hash parameter is set to 1 in the /config endpoint, Kong will calculate and return the SHA256 hash of the current configuration. This hash can be used to verify that the configuration has not been modified since it was last retrieved.

Challenges in addressing Kong issues

The dynamic and unpredictable nature of event-driven systems presents a challenge in understanding and reproducing issues related to Kong. It is crucial for developers and SRE’s to understand Kong’s architecture and behavior in order to address issues effectively.
While Kong provides a huge number of metrics for requests, errors, and latency, it provides limited ways to look into the internals of Kong, such as where time is being spent and what is in Kong memory. This insufficient visibility into Kong internals can make it difficult to identify and resolve issues, further complicating the resolution process.
Although the Kong community is great for getting started, it may be limited in its support for issues faced when operating Kong at a large scale.

Understanding the event driven nature of Nginx/Kong

Understanding the differences between the two flavors of proxy server architecture — synchronous, multi-threaded and asynchronous, event-driven, single-threaded — is crucial for large-scale Kong operations. This can also help in implementing best practices and avoiding common pitfalls that can affect the reliability and stability of the Kong infrastructure.

Synchronous, Multi-Threaded Proxy Server Architecture

Apache is a popular example of a synchronous, multi-threaded proxy server architecture. In this architecture, each incoming connection is handled by a separate thread. When the server receives a request, a new thread is created to handle the request. These threads are managed by the server’s thread pool, which has a fixed size limit.

One of the main advantages of this architecture is that it is relatively simple to implement and understand. Each thread handles a single request, and the server can easily track the status of each thread. This architecture is also well-suited for handling long-lived connections, such as those used in HTTP/1.1.

However, there are also some drawbacks to this architecture. One is that it can be resource-intensive, as each thread requires a certain amount of memory to operate. Additionally, the thread pool limit can become a bottleneck under heavy load, leading to performance degradation or even system crashes.

Asynchronous, Event-Driven, Single-Threaded Proxy Server Architecture

Nginx and Kong are examples of an asynchronous, event-driven, single-threaded proxy server architecture. In this architecture, a single thread is responsible for handling all incoming requests, and I/O operations are handled asynchronously using an event loop.

In this architecture, the server does not create a new thread for each incoming connection. Instead, the server handles multiple connections within a single thread by using non-blocking I/O operations. The event loop is responsible for monitoring all I/O operations and determining which ones are ready to be processed. This approach can be highly efficient because it eliminates the need to create and manage a large number of threads.

One of the main advantages of this architecture is its scalability. Because the server can handle a large number of connections within a single thread, it can easily scale to handle high traffic loads. Additionally, the event-driven approach can be highly efficient, especially when dealing with I/O-bound operations.

Challenges with Event-Driven Nature of Kong

Event-driven architecture, which Kong is built upon, can suffer from performance issues in certain scenarios.

Event-driven architecture and CPU-bound tasks

One of the key problems with event-driven architecture in Kong is that it doesn’t perform well with CPU-bound tasks. When one operation is taking up more CPU time, the entire event loop can be delayed, even for unrelated operations.

Imagine a Kong plugin that performs complex data processing operation on each API request. Since event-driven architecture is optimized for handling I/O-bound tasks, the heavy CPU-bound computations can significantly impact the event loop’s performance, causing increased response times and reduced throughput.

For another example, let’s consider the use case where you have a large declarative configuration with a size of a few MBs. If you try to read this configuration using the Kong admin API GET /config every 10 seconds, it might cause slowness in the overall system. This is because the /config endpoint has to write a large amount of data from Kong’s memory to the network, which can be time-consuming and slow down the event loop. As a result, other operations that are waiting in the event loop may also be delayed, leading to poor performance and slow response times.

Similarly, if a plugin creates a timer that performs a heavy CPU operation, or Kong system operations such as routing table rebuild, can affect the performance of other requests being processed by the worker. This can happen without any noticeable indication that the slow performance was caused by the heavy timer operation.

High traffic or a large number of connections overwhelming the event loop

During a sudden spike in traffic, such as a promotional campaign or a viral event, the event loop in Kong may face an overwhelming number of concurrent requests. If the event loop becomes saturated, it may struggle to process incoming connections in a timely manner, resulting in increased latency, dropped requests, or even server timeouts.

Single-threaded nature and limited multi-core CPU utilization

Suppose a high-traffic API using Kong operates on a server with multiple CPU cores. Due to the single-threaded nature of the event-driven architecture, Kong can only utilize a single CPU core for request processing.

To enhance the performance of Kong workers, it’s important to choose a CPU processor that provides high single core performance. This is because Kong’s event-driven architecture relies heavily on the performance of a single CPU core. Choosing a processor with a higher clock speed and optimized for single-threaded performance can significantly improve the overall performance of Kong.

I recommend choosing a processor with high single core performance and to use CPU-optimized instances in the cloud. These instances are designed to provide the best possible performance for compute-intensive workloads, such as those required by Kong workers. Using CPU-optimized instances can help to ensure that Kong workers have access to the necessary resources to handle high levels of traffic and requests.

Another important consideration is to allocate one CPU per worker. By doing so, you can ensure that each worker has a dedicated CPU core to process requests, which can help to prevent resource contention and improve overall performance. In Kubernetes, you can implement CPU affinity using CPU manager policies, which allow you to dedicate CPUs to Kong pods and allocate a dedicated CPU core per worker.

Recommendations for Reliable Large-Scale Kong Operations

Based on my experience operating large scale Kong API gateway operations, here are my recommendations for handling reliability issues:

Avoid CPU-bound operations within Kong as Nginx wasn’t designed for API gateways.
Keep Kong installation simple to prevent worker overload.
Add new plugins to Kong only if necessary.
Offload complex logic from Kong to other system components.
Use dedicated Kong deployments for special plugins.
Maximize worker performance with high single core hardware.
Keep Kong updated with the latest stable release.
Regularly test Kong’s performance and scalability.
Regularly review and optimize Kong’s configuration for better performance.

As we wrap up our discussion on reliable large-scale Kong operations, let me leave you with a Kong-quer question to ponder.