Skip to content

adr-01: Timer-based discovery-cache caching

Status

Implemented in #150.

Context

As of 2023-03-05, Kele uses a naive set of two separate caches:

  • The “kubeconfig cache,” which brokers reads from the user’s configured kubeconfig (kele-kubeconfig-path);
  • The “discovery cache,” which brokers reads from the user’s discovery cache.

Note

The latter, confusingly, shares the same name with the discovery cache that lives in the user’s filesystem, typically under ~/.kube/cache/discovery. We use “discovery cache” to refer specifically to the Kele data structure, and use “filesystem discovery cache” to refer to the “actual” discovery cache.

On enablement of kele-mode, Kele initializes both the kubeconfig cache and the discovery cache. The kubeconfig cache loads the contents of the user’s kubeconfig file into memory; likewise, the discovery cache loads the contents of the user’s filesystem discovery cache into memory. Both caches initialize [file watchers] that “auto-refresh” the respective cache contents on changes to the underlying file(s).

The combined use of these two caches enables near-instant completion of:

  • A user’s contexts and other cluster configurations;
  • The available API groups, versions, and kinds on a given cluster.

Problems

The aforementioned “all-at-once” approach has proven to have several shortcomings.

Gratuitous File-Watching

The most fundamental problem is that the discovery cache does not scale. Emacs file-watching uses file descriptors under the hood, which Emacs has a [finite number available for use at any given time][1]. Exceeding this limit results in an error to the following effect:

File watching not possible, no file descriptor left: 975

This limit is, to my knowledge, not user-configurable. Even if it were, it is an unreasonably invasive thing to ask users to do – for an Emacs package of all things. Most notably, this limit is, like all things, shared globally within Emacs, e.g. by [LSP-Mode][lsp-mode] and [auto-revert-mode], making the “real” limit for Kele much lower. In order to be a “good Emacs citizen,” Kele needs to be much more conservative and strategic in its use of file-watchers.

We note that it is very easy to hit this limit. Anecdotally, I maintain kubectl access to a handful of production-scale clusters as part of my day job, and simply adding a couple more ad-hoc [Kind] clusters as part of integration-testing for this very package is enough for me to hit the limit. Doing so requires me to manually delete the filesystem discovery cache directories corresponding to these transient Kind clusters On top of being annoying, this is also a fundamentally unreasonable workaround; what happens if a user simply has that many clusters to maintain?

Decision

The simplest and “stupidest” solution to this problem is to make caching of the filesystem discovery cache timer-based rather than filewatch-based.

Consequences

This approach represents, to an extent, a “regression” of sorts in the functionality of the discovery cache, as now the contents thereof are not guaranteed to be fully up-to-date with those of the filesystem dicovery cache. We decide that this is safe, as the set of group-version-kinds present in a cluster are unlikely to change that frequently. This is consistent with kubectl’s own refresh policy for the filesystem discovery cache, which refills the cache every ten minutes – and lazily, at that, i.e. on the next user invocation of kubectl after the ten-minute mark.