In-memory weights caching

New in version 0.5.0.

Note

This caching is only related to the precomputed and precomputed-local backends in regrid().

Purpose

earthkit-regrid provides an in-memory cache for interpolation pre-computed interpolation weights. When it is enabled, weights loaded from the disk are stored in memory and when we call regrid() with the same grids they do not have to be loaded from disk again. The cache can be configured to have a maximum size and eviction policy.

Note

Please note that the earthkit-regrid in-memory cache configuration is managed through the Configuration.

In-memory cache policies

The primary config option to control the in-memory cache is weights-memory-cache-policy, which can take the following values:

Largest cache policy

When the weights-memory-cache-policy is “largest” first evicts the largest matrices from the in-memory cache (default). The cache eviction policy is applied before loading the weights to ensure that it will fit into the cache. When it is not possible the behaviour depends on the weights-memory-cache-strict-mode option. The maximum memory size of the in-memory cache is defined by the maximum-weights-memory-cache-size option. The default is 500 MB.

>>> from earthkit.regrid import cache, config
>>> config.set("weights-memory-cache-policy", "user")
>>> config.get("weights-memory-cache-policy")
'user'
>>> config.get("maximum-weights-memory-cache-size")
524288000
>>> config.get("weights-memory-cache-strict-mode")
False

LRU cache policy

When the weights-memory-cache-policy is “lru” first evicts the least recently used matrices from the in-memory cache. The cache eviction policy is applied before loading the weights to ensure that it will fit into the cache. When it is not possible the behaviour depends on the weights-memory-cache-strict-mode option. The maximum memory size of the in-memory cache is defined by the maximum-weights-memory-cache-size option. The default is 500 MB.

>>> from earthkit.regrid import cache, config
>>> config.set("weights-memory-cache-policy", "lru")
>>> config.get("weights-memory-cache-policy")
'lru'
>>> config.get("maximum-weights-memory-cache-size")
524288000
>>> config.get("weights-memory-cache-strict-mode")
False

Unlimited cache policy

When the weights-memory-cache-policy is “unlimited” will keep all matrices in memory.

>>> from earthkit.regrid import cache, config
>>> config.set("weights-memory-cache-policy", "unlimited")
>>> config.get("weights-memory-cache-policy")
'unlimited'

Off cache policy

When the weights-memory-cache-policy is “off” there is no cache, the matrices are always loaded from disk.

>>> from earthkit.regrid import cache, config
>>> config.set("weights-memory-cache-policy", "off")
>>> config.get("weights-memory-cache-policy")
'off'

Getting the state of the in-memory cache

The current status of the in-memory cache can be retrieved using the memory_cache_info() function. It returns a namedtuple with fields hits, misses, maxsize, currsize, count and policy.

>>> from earthkit.regrid import memory_cache_info
>>> memory_cache_info()
CacheInfo(hits=9, misses=1, maxsize=524288000, currsize=259170724, count=1, policy='largest')

Clearing the in-memory cache

The in-memory cache can be cleared using the clear_memory_cache() function.

>>> from earthkit.regrid import clear_memory_cache
>>> clear_memory_cache()
>>> memory_cache_info()
CacheInfo(hits=0, misses=0, maxsize=524288000, currsize=0, count=0, policy='largest')

In-memory cache limits

Warning

These config options are only used when weights-memory-cache-policy is largest or lru.

maximum-weights-memory-cache-size

The maximum-weights-memory-cache-size option defines the maximum memory size of the in-memory cache in bytes. The default is 500 MB.

weights-memory-cache-strict-mode

When the weights-memory-cache-strict-mode option is True, raises ValueError if the weights cannot be fitted into the cache. If False and the weights cannot be fitted into the cache it simply does not load the weights into the cache. The default is False.

In-memory cache config parameters

Name

Default

Description

maximum‑weights‑memory‑cache‑size

‘500MB’

The maximum memory size of the in-memory precomputed weight cache in bytes. Only used when weights-memory-cache-policy is "largest" or "lru". Can be set to None. See In-memory weights caching for more information.

weights‑memory‑cache‑policy

‘largest’

The in-memory precomputed weights cache policy. Valid values: off, unlimited, largest and lru. See In-memory weights caching for more information.

weights‑memory‑cache‑strict‑mode

False

Raise exception if the weights cannot be fitted into the in-memory cache. Only used when weights-memory-cache-policy is "largest" or "lru". See In-memory weights caching for more information.

Other earthkit-regrid config options can be found here.

Notebooks

Examples

import numpy as np
from earthkit.regrid import regrid, config

# set memory cache with a maximum size of 100 MB to evict the largest matrices first
config.set(
    weights_memory_cache_policy="largest",
    maximum_weights_memory_cache_size=100 * 1024**2,
)
print(memory_cache_info())

# create a random data array and regrid it
data = np.random.rand(5248)
interpolated_data = regrid(data, in_grid={"grid": "O32"}, out_grid={"grid": [5, 5]})
print(memory_cache_info())

# repeat interpolation, this time the weights are loaded from the cache
data = np.random.rand(5248)
interpolated_data = regrid(data, in_grid={"grid": "O32"}, out_grid={"grid": [5, 5]})
print(memory_cache_info())

output:

CacheInfo(hits=0, misses=0, maxsize=104857600, currsize=0, count=0, policy='largest'))
CacheInfo(hits=0, misses=1, maxsize=104857600, currsize=102340, count=1, policy='largest')
CacheInfo(hits=1, misses=1, maxsize=104857600, currsize=102340, count=1, policy='largest')