feotest: Verifiable Evidence for Rust Medical Devices

10 June 2026

By Michael Mannion

A diagnostic device's output is stochastic: the same specimen can yield different outcomes — from the biological sample, the reagent chemistry, the electronics, or a firmware change. Conventional pass/fail testing misses this. An open-source, Rust-native framework, feotest makes device performance testable and monitorable: establish a statistically defined baseline, then continuously verify functional and temporal conformance with confidence intervals and auditable verdicts. A runnable medical-device example shows the whole loop, against the 2026 QMSR and EUDAMED evidence environment.

A cheerful orange Rustacean crab wearing EEG scalp electrodes on its shell, wired to a small portable monitor showing brain-wave traces — Ferris gets wired in: Rust is taking root in safety-critical medical software, and feotest gives it a way to measure and verify the device too.

If you build medical devices in Rust, you already live with a truth that has nothing to do with your language: the same specimen, measured twice, need not give the same answer. Your software may be deterministic, but the device runs in a context where variability is inevitable — the biological sample, the reagent chemistry, the electronics — and a firmware change can shift the result again. Conventional pass/fail testing assumes a deterministic system, so it cannot honestly answer the question that matters: has performance drifted from the version you validated?

That gap is what feotest is built to close — an open-source, Rust-native framework for exactly these questions. If you have not heard of it, that is expected: the approach itself is not yet widely known. It treats a device as a stochastic service — one whose output varies even for an identical input — and turns that variability into a repeatable measure → verify loop: first establish a statistically defined baseline, then continuously test whether the current device still conforms to it. Functional outcomes are modelled with Bernoulli/binomial reasoning and reported with confidence intervals (Wilson construction); latency is treated as a distribution rather than an average; and an overall verdict is a conjunction of contract criteria rather than a single deterministic assertion. The underlying methodology is documented at r.mavai.org, whose Statistical Companion distinguishes functional stochasticity from temporal stochasticity and pins down every formula the framework ships.

You don’t have to take feotest’s statistics on trust. Its own output is verified, release by release, against an independent reference implementation of the methodology — exactly the evidence of tool correctness that software-tool validation under IEC 62304 expects.

To make this concrete, feotest ships a runnable medical-device example. The demonstration models a diagnostic device whose results vary from specimen to specimen, then characterises a baseline over a reference panel, records covariates such as software version and reagent lot, verifies later runs against the committed baseline, and raises a covariate mismatch when the comparator is no longer valid. That maps directly onto real lifecycle events: firmware releases, assay or reagent-lot changes, instrument drift, and post-market performance monitoring.§

The timing is no accident. In the United States, the FDA’s Quality Management System Regulation became effective on 2 February 2026 and incorporates ISO 13485:2016 into the U.S. device quality-system framework, with FDA inspections now aligned to the new QMSR process. For software-heavy devices, FDA guidance also continues to emphasise documented evidence for device software functions, AI-enabled device lifecycle management, predetermined change-control plans, and cybersecurity risk management in premarket submissions.

Each of those maps onto something feotest produces. The measure → verify loop is documented design verification for a software device function — objective evidence rather than an anecdote. A FAIL verdict carrying a stated minimum detectable degradation is nonconforming-product detection with quantified confidence, not a judgement call. The distributional, confidence-interval treatment of functional and temporal performance is what AI-enabled lifecycle management needs when a single deterministic assertion is meaningless. And the tightest fit is change control: because covariates such as software version and reagent lot are part of baseline identity, a change invalidates the comparator and triggers re-verification automatically — exactly the trigger a predetermined change-control plan is built around. The one premarket emphasis feotest does not touch is cybersecurity, which sits outside its scope.

The European context points the same way. EUDAMED, the European Database on Medical Devices, has moved from long-running rollout to mandatory operational relevance: as of 28 May 2026, the Actor registration, UDI/Device registration, Notified Bodies & Certificates, and Market Surveillance modules are mandatory, while Clinical Investigations/performance studies and Vigilance/post-market surveillance remain under analysis or development. Recent MDCG guidance also sharpens the evidence environment, including 2025 guidance on IVD performance studies and the interplay between MDR/IVDR and the EU AI Act for medical-device AI.

What feotest produces is the evidence EUDAMED’s surveillance side consumes. The continuous verify loop is post-market performance monitoring; its covariate-mismatch and FAIL signals are the trended functional and temporal data that feed the Market Surveillance module and the forthcoming vigilance and post-market modules. The confidence-interval performance figures speak to the substance MDCG’s 2025 IVD performance-study guidance expects, and the distributional treatment suits the AI-enabled performance claims that MDR/IVDR and the EU AI Act jointly scrutinise.

The Rust angle is what makes this immediately usable rather than aspirational. Rust is moving into the safety-critical conversation rather than remaining a general-purpose systems language: the Rust Foundation created the Safety-Critical Rust Consortium in June 2024 to support responsible Rust use in safety-critical software, and the Ferrocene toolchain is positioned as a qualified Rust toolchain for safety- and mission-critical systems, including IEC 62304 Class C medical-device software development. For manufacturers already shipping Rust in device software, a Rust-native probabilistic testing framework is not just convenient; it keeps the evidence-generation layer in the same language as the implementation, the same test harness, the same CI system, and the same release process.

For manufacturers, the offer is practical: define the claim, encode the contract, measure the baseline, version the evidence, and re-run verification whenever the device or its operating context changes. The artifacts it produces — baselines, verdicts, confidence floors, latency percentiles, covariate identities, and auditable reports — feed directly into clinical evaluation, analytical validation, risk management, and the wider quality-management system. They supply the durable, ongoing performance evidence those processes depend on as the device and its operating context evolve.

For auditors, the value is complementary. A conventional spreadsheet of pass/fail results often hides the assumptions behind sampling, thresholds, confidence, and environmental comparability. Auditors already ask the questions that matter — What claim is being tested? What is the reference population? What confidence level is used? What is the minimum detectable degradation? Which covariates define baseline identity? What happens when the reagent lot changes? With feotest, those questions have inspectable answers. Each is carried by a generated evidence artifact, produced by statistical machinery that is open-source and built on a publicly accessible model. An auditor can trace a verdict back through the method that produced it, rather than taking a vendor’s numbers on trust.

Stochastic behaviour in a device is nothing new — but the demands around it are growing. Authorities and regulators increasingly expect continuous, demonstrable evidence that a device still performs, and feotest meets that head on: an executable evidence loop that keeps proving a validated device still behaves like the device it claims to be. And as more of those devices are written in Rust — a language earning its place in safety-critical work — feotest is a natural fit for the toolchain and CI the team already runs.

References

feotest framework: https://github.com/mavai-org/feotest
feotest medical-device example: https://github.com/mavai-org/feotest-showcase-medical
feotest methodology site: https://r.mavai.org/
feotest Statistical Companion: https://r.mavai.org/statistical-companion.pdf
FDA Quality Management System Regulation: https://www.fda.gov/medical-devices/postmarket-requirements-devices/quality-management-system-regulation-qmsr
FDA guidance on premarket submissions for device software functions: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/content-premarket-submissions-device-software-functions
European Commission EUDAMED overview: https://health.ec.europa.eu/medical-devices-eudamed/overview_en
European Commission MDCG guidance page: https://health.ec.europa.eu/medical-devices-sector/new-regulations/guidance-mdcg-endorsed-documents-and-other-guidance_en
Rust Foundation Safety-Critical Rust Consortium: https://rustfoundation.org/safety-critical-rust-consortium/