Mavai — Testing Non-Deterministic Systems — Projects

mavai-R

Mon, 01 Jan 0001 00:00:00 +0000

mavai-R is the statistical oracle for the mavai project family. It uses R — the gold standard for statistical computing — to generate language-agnostic reference datasets against which all mavai framework implementations verify their statistics engines.

Why it exists

The mavai family includes multiple independent implementations of the same statistical methodology: punit (Java), feotest (Rust), and future frameworks in other languages. Each implements Wilson score confidence intervals, threshold derivation, power analysis, feasibility checking, and verdict evaluation independently, in its own language and idiom.

feotest examples

Mon, 01 Jan 0001 00:00:00 +0000

feotest examples will be a companion repository containing worked examples that demonstrate the feotest framework across its major capabilities, mirroring the scenarios covered by punit examples in idiomatic Rust.

Coming soon

feotest examples is in active development alongside feotest. The repository will be available at github.com/mavai-org/feotest-examples once published.

feotest

Mon, 01 Jan 0001 00:00:00 +0000

feotest is a probabilistic testing framework for Rust. It brings the same statistical methodology as punit — repeated trials, confidence intervals, threshold-based verdicts — to the Rust ecosystem, built from the ground up as idiomatic Rust rather than a port.

Coming soon

feotest is in active development and its public release is imminent. The repository will be available at github.com/mavai-org/feotest once published.

Why Rust

Rust’s ownership model, zero-cost abstractions, and strong type system make it a natural fit for infrastructure and safety-critical systems — exactly the kind of services where probabilistic testing matters most. feotest is designed to feel native to Rust developers, following the conventions and idioms of the Rust testing ecosystem.

outcome

Mon, 01 Jan 0001 00:00:00 +0000

outcome is a Java framework that provides a formal boundary between deterministic application code and fallible, non-deterministic operations such as network calls, database queries, and external API requests.

The problem

Java’s exception model conflates three fundamentally different failure categories: operational failures (network timeouts, service unavailability), defects (null pointers, logic errors), and terminal errors (out of memory). This leads to inconsistent error handling, ad-hoc retry loops, and swallowed exceptions across codebases.

punit examples

Mon, 01 Jan 0001 00:00:00 +0000

punit examples is a companion repository containing a fully worked example application that demonstrates the punit framework across all its major capabilities.

Two example domains

Shopping Basket (empirical approach)

An LLM translates natural language instructions (e.g. “Add 2 apples”) into structured JSON actions for a shopping basket API. Because LLM behaviour is inherently probabilistic — it may hallucinate fields, produce malformed JSON, or invent invalid actions — success rates are established empirically through measurement experiments rather than predetermined.

punit

Mon, 01 Jan 0001 00:00:00 +0000

punit is a JUnit 5 extension framework for probabilistic testing. It is designed for systems where behaviour is non-deterministic by nature — LLM integrations, ML model inference, distributed systems, and randomised algorithms.

How it works

Instead of the traditional binary pass/fail model, punit executes a test multiple times and treats each run as a Bernoulli trial. It then applies statistical inference to determine whether the observed success rate meets a defined threshold at a given confidence level.