Performance#

Koda Validate is reasonably fast (for Python). It tends to be significantly faster than Pydantic, for instance. There are several known things you can do if you really need to eek out every ounce of performance.

Use asyncio for IO#

Use asyncio-based validation wherever you need to do IO during validation. Even if this is not a common need, it merits mentioning first because:

  • switching to async validation in Koda Validate is relatively simple

  • the performance gains can be orders of magnitude in some cases

Initialize Validators in Outer Scopes#

Ideally, Validators should be initialized at the module level. If that’s not possible, initializing them either lazily (once), or as few times as possible (i.e. not for every validated value) is advantageous because:

  • often there’s no need to initialize a Validator for each value being validated; and initialization is not always cheap

  • many of the Validators in Koda Validate are optimized on the assumption they will be initialized less often than they’ll be called

Slower#

class Book(TypedDict):
    title: str
    author: str


def some_request_handler(data: Any) -> ValidationResult[Book]:
    # the validator is initialized every time `some_request_handler` is called
    return TypedDictValidator(Book)(data)

Faster#

class Book(TypedDict):
    title: str
    author: str

# the validator is initialized once
book_validator = TypedDictValidator(Book)


def some_request_handler(data: Any) -> ValidationResult[Book]:
    return book_validator(data)

Use a Cache#

Koda Validate provides CacheValidatorBase, a caching layer you can wrap Validators with. You will need to subclass CacheValidatorBase to work with whatever caching backend you have.

In this example, we’ll use a basic dict to act as a cache.

from dataclasses import dataclass, field
from typing import Dict, Any, TypeVar
from koda import Maybe, Just, nothing
from koda_validate import (CacheValidatorBase, ValidationResult, ListValidator,
                           StringValidator, IntValidator)

A = TypeVar('A')

@dataclass
class DictCacheValidator(CacheValidatorBase[A]):
    _dict_cache: Dict[Any, ValidationResult[A]] = field(default_factory=dict)

    def cache_get_sync(self, val: Any) -> Maybe[ValidationResult[A]]:
        if val in self._dict_cache:
            return Just(self._dict_cache[val])
        else:
            return nothing

    def cache_set_sync(self, val: Any, cache_val: ValidationResult[A]) -> None:
        self._dict_cache[val] = cache_val

Warning

It is generally unwise to use an boundlessly expanding dict as we have in our example – it will continuously increase its memory footprint. Please don’t reuse this code for anything in production!

The validator should behave as the wrapped Validator normally would:

>>> cached_int_validator = DictCacheValidator(IntValidator())
>>> cached_int_validator(5)  # cache miss
Valid(val=5)
>>> cached_int_validator(5)  # cache hit
Valid(val=5)
>>> cached_int_validator("a string")  # cache miss
Invalid(err_type=TypeErr(expected_type=<class 'int'>), ...)
>>> cached_int_validator("a string")  # cache hit
Invalid(err_type=TypeErr(expected_type=<class 'int'>), ...)

Note

This example uses a simple IntValidator in synchronous mode for simplicity. Caching will not offer big gains in all cases. It is probably most useful in async contexts, or where validators are performing a lot of computation.

Because we can compose Validators, caching can be done with as much granularity as you need. Here we’ll only use a cache for the items of the list, but the list in total will not use a cache.

validator = ListValidator(DictCacheValidator(StringValidator()))

Note

Of course, if you want a different API for caching, you’re free to write your own caching wrapper. It’s probably worth taking a look at the CacheValidatorBase source code. It’s not complicated.


Look at koda_validate._internals#

There are a few classes in _internals.py that are optimized for speed. For instance, most of the built-in Validators subclass _ToTupleValidator.

The contents of koda_validate._internals may change without notice. You can use some of the base classes in there at your own risk, or just mimic some of the patterns.


Compile Parts of Koda Validate#

Koda Validate is not compiled. mypyc can trivially compile parts of the code. It would probably not be incredibly difficult to alter the source code in a way that facilitates greater speedups from mypyc. Significant speedups are definitely possible.

Note

Compiling Koda Validate is not in any immediate plans, for a few reasons:

  • Koda Validate is already generally faster than competing libraries

  • Compilation requires a strategy – especially since some kinds of compilation can complicate extension

  • It’s easier to add new features – and to refactor – without an extra compilation step

  • CPython itself is getting faster. 3.11 is significantly faster than 3.10. 3.12 is meant to be faster still.

Depending on how things evolve, this my change.