Kong API Gateway- Tackling Timer Troubles in Production

8 min readMar 13, 2023

This article explores the timer-related challenges that Kong production environments commonly face and provides effective strategies to mitigate these issues, ensuring a reliable and efficient API Gateway operation.

Introduction to OpenResty timer

OpenResty is a web platform that includes a modified version of the Nginx web server, with additional features and modules that allow developers to build scalable and high-performance web applications. OpenResty has its own timer API, which is similar to the timer API in Nginx.

OpenResty timers are used to schedule events in the Nginx event loop, such as running Lua scripts, making HTTP requests, or performing other I/O operations. OpenResty provides several timer-related functions that allow developers to create and manage timers in their Lua scripts, including ngx.timer.at, ngx.timer.every, and ngx.timer.3s. These functions allow developers to specify a callback function that will be executed when the timer fires, as well as a delay or interval for the timer.

Overall, OpenResty timers were an important feature for Kong API Gateway as it was crucial for efficient task handling, improving performance and ensuring scalability.

How are timers used in Kong?

Timers are used in Kong in several ways to support various features, such as:

Caching: Kong uses timers to determine the expiry time for cached responses. This allows Kong to serve cached responses without having to query the upstream service repeatedly.
DNS resolution: Kong uses timers to perform periodic DNS lookups to resolve hostnames. This ensures that Kong always has up-to-date IP addresses for the upstream services it’s proxying requests to.
Health checks: Kong uses timers to periodically check the health of upstream services. This helps Kong to route traffic only to healthy services.
Rate limiting: Kong uses timers to track the number of requests made by clients within a specified time interval. This allows Kong to enforce rate limits and prevent clients from overwhelming the system with too many requests.

In addition, timers are also used by some plugins in Kong to perform asynchronous tasks, such as connecting to databases, making HTTP requests, or processing data. By using timers, these plugins can execute these tasks in the background without blocking the main Kong event loop.

Challenges with OpenResty’s Timer

In OpenResty, the red-black tree is used to keep track of the timers that are scheduled to fire at a given time. The time complexity of inserting or finding a timer in a red-black tree is O(log n), which becomes a bottleneck for high-throughput systems. As the number of timers grows, the binary heap can become unmanageably large and cause performance issues.

From the NGINX perspective, every timer is equivalent to a request since OpenResty generates a fake request for each timer. This means that as the number of timers increases, so does the number of requests, which can impact system performance.

OpenResty’s timer also uses a linked list of coroutines that many APIs need to read and write. If this list becomes too long, it can cause delays and negatively impact performance.

Finally, OpenResty’s timer provides few statistics, which makes it difficult to analyze failures and diagnose issues.

Kong’s next generation lua-resty-timer-ng Library

The Kong API Gateway team implemented lua-resty-timer-ng, the next generation of lua-resty-timer/OpenResty’s timer. It’s a Lua library that builds a scalable timer system using the timer wheel algorithm and a timer pool. It can efficiently schedule 100,000 or more timers and consume less memory.

Lua-resty-timer-ng has been integrated into Kong Gateway 3.0

Some of the key features of the lua-resty-timer-ng library include:

High-resolution timers with accuracy down to the microsecond level
Efficient implementation using the timer wheel algorithm and a timer pool for minimal overhead
Support for both one-shot and recurring timers
Ability to cancel timers before they expire

Timer Wheel Algorithm

The timer wheel algorithm is a technique for efficiently managing a large number of timers in a system. It uses a circular buffer, or “wheel,” of fixed size, with each slot in the buffer representing a time interval. The wheel rotates continuously, and when the pointer reaches a slot, all timers scheduled to expire in that slot are executed.

The timer wheel algorithm is an improvement over using a tree data structure to manage timers because it offers constant time insertion and deletion of timers, regardless of the number of timers being managed. In contrast, the cost of these operations in a tree data structure can be logarithmic, leading to performance issues as the number of timers grows.

The Hierarchical Timer Wheel algorithm with resolution of microseconds, seconds, minutes, and hours is a specific implementation of the algorithm designed to handle timers with expiration times ranging from microseconds to hours. The algorithm uses a tree structure of fixed size, organized into four levels, with each level representing a different resolution.

Hierarchical Timer Wheel (Image courtesy https://slideplayer.com/slide/13145705/)

Kong uses the timer wheel algorithm to fix performance issues caused by OpenResty timers. Kong was experiencing performance degradation when handling a large number of concurrent requests due to the use of timers for rate-limiting and DNS timeouts. To address this, Kong implemented a timer wheel algorithm to manage these timers. By using the timer wheel algorithm, Kong was able to efficiently manage a large number of timers with constant-time operations, leading to improved performance and scalability.

Additionally, the use of the timer wheel algorithm reduced memory fragmentation and avoided the need for dynamic memory allocation, further improving performance.

Architecture

The lua-resty-timer-ng library uses two threads to manage timer-related tasks: the super thread and the worker thread.

The super thread is responsible for handling incoming timer requests. If the timer is a recurring timer, it is added to the timer wheel, while a one-time timer is directly placed in the pending queue.

By default, the timer wheel has 10 microsecond slots, 60-second slots, 60-minute slots, and 24-hour slots, with 100 microseconds, second, minute, and hour resolution. This means that the minimum timer configuration that can be supported is 100 microseconds, while the maximum is 24 hours, 59 minutes, 59 seconds, and 900 microseconds (or 23:59:59.90).

In addition to accepting timer requests, the super thread also scans the timer wheel for expired timers and moves them to the pending queue. This ensures that the timer wheel is always up to date and ready to execute the next set of timers.

The worker thread, on the other hand, manages a pool of worker timers. The pool includes 32 minimum threads and 256 maximum threads, and the actual number of threads in the pool is automatically adjusted based on the system load. This scaling is done every 10 seconds, ensuring that the system is always utilizing the optimal number of threads to handle timer requests.

Additionally, Kong provides a timer admin API as an endpoint that provides information about timers that are currently active and being handled by Kong’s underlying timer system. The timer admin API endpoint is /timers and can be accessed via HTTP GET method.

https://docs.konghq.com/gateway/latest/admin-api/#retrieve-runtime-debugging-info-of-kongs-timers

DNS Resolution Timeouts

If you are managing extensive deployments of Kong, you may come across the error message “timeout/dns lookup pool exceeded retries (1): timeout” in the Kong logs. This issue has been reported by several users and documented in the Kong Github repository.

Following are some of the related issues reported to Kong

https://github.com/Kong/kong/issues/9959

https://github.com/Kong/kong/issues/9964

https://github.com/Kong/kong/issues/10167

https://github.com/Kong/kong/issues/10107

https://github.com/Kong/kong/issues/10093

Kong optimizes network performance by reducing the number of DNS queries needed to resolve hostnames. When a client requests an API endpoint, the Kong gateway needs to resolve the hostname of the upstream server to an IP address using a DNS query. However, when handling a large number of concurrent requests, the number of DNS queries can quickly become a bottleneck, leading to increased latency and reduced throughput.

To address this issue, Kong employs DNS batching, where multiple DNS queries are combined into a single request. Kong uses the resty.dns.resolver library, which includes a built-in DNS resolver that supports batching. When Kong receives a request, it adds the requested hostname to a queue. Subsequent requests for the same hostname will not make new DNS query but it is batched with the previous DNS query, Kong sends a single DNS query to resolve the same hostname. This reduces the number of DNS queries needed and improves network performance.

However, Kong also needs to be mindful of the potential impact of DNS query timeouts. Kong uses the lua-resty-timer-ng library to run batched DNS queries, and places these DNS query timers in a pending queue. If the pending queue becomes too large and is not able to complete the DNS query timer within 2 seconds, the DNS queries get timed out. This can impact the functionality of Kong and cause disruptions.

Summary

Efficient management of timers is essential to achieve optimal performance in the Kong API Gateway, especially in a Kubernetes environment. To accomplish this objective, several best practices can be implemented.

The first best practice is to upgrade to Kong 3.0 or later versions if using an older version. Upgrading to a newer version provides performance optimizations and improvements.

The second best practice is to utilize Fully Qualified Domain Names (FQDNs) to avoid the ndots problem in Kubernetes. FQDNs can help minimize DNS queries and reduce timer usage, thereby enhancing the efficiency of Kong API Gateway.

Thirdly, it is recommended to carefully consider the frequency of health checks when using timers. Frequent health checks can adversely impact performance as they require more timers. Therefore, a reasonable period should be set for health checks, and passive health checks should be utilized where possible.

Fourthly, it is advisable to avoid using plugins that rely heavily on timers, as they can negatively affect performance. Similarly, plugins that use timers should be reviewed to ensure they do not degrade performance.

Fifthly, it is recommended to avoid using timers during the plugin access phase as this can create an excessive number of timer instances. Since every request executes the access phase, this can lead to an overwhelming number of timers.

Finally, sharding Kong instances to optimize the number of route configurations per Kong node/pod is recommended. This approach distributes the load and minimizes the impact of timers on performance.

To monitor timer usage, Kong’s admin API timers can be utilized. If necessary, any of the above-mentioned optimizations can be applied to reduce timer usage and improve the efficiency of Kong API Gateway.

In summary, following these best practices enables efficient timer management in Kong API Gateway in a Kubernetes environment and ensures optimal performance.

My Other Kong Blogs

Tick Tock Woes — Tackling Timer Troubles in Kong Production

KongAPI Gateway Behind the Scenes: Overcoming Reliability Challenges

Optimizing Health Checks and Load Balancing in Kong API Gateway: Best Practices for Upstreams, Targets, and Active/Passive Health Checks

GitOps Approch to Configuration Management In Kong DBless Mode

Kong API Gateway- Tackling Timer Troubles in Production

My Other Kong Blogs

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Suren Raju

No responses yet