Your local AI workstation.

Chat, images, video, benchmarks, and an OpenAI-compatible API — local, in one desktop app.

Download ChaosEngineAI Explore the product

Speculative decoding MTPLX + DFlash + DDTree

Native MTP on Apple Silicon, draft-model DFlash, tree-aware DDTree. Backend picks the fastest path per model.

Cache control 7 + 2 strategies

Native, TurboQuant, TriAttention, FBCache, TeaCache, TaylorSeer, PAB. Two more — MagCache, FasterCache — reachable via CLI / API for power users.

Evaluation 3 benchmark modes

Throughput, perplexity, and task accuracy. Saved, diffable, comparable.

macOS Linux Windows signed in-app updates

Why it feels different

Not a launcher. A control plane.

Catalog, runtime tuning, tool calling, image and video pipelines, server, benchmarks, prompt library, and plugins — one workspace, one mental model.

Runtime decisions in one place

Native f16, compressed KV cache, speculative decoding. Pick from the UI, not a shell flag.

Proof, not vibes

Run a benchmark, save it, diff it. Know whether a cache strategy or draft model is actually worth it.

Outputs that stay together

Threads, images, videos, templates, logs, adapters, and API snippets all live in one desktop surface.

Product tour

The way a power user actually moves through it.

Discovery, chat, benchmarks, image and video studios, and OpenAI-compatible serving — one runtime-aware desktop.

Model catalogs + launch controls

Find the right model, not just a file.

Curated families, capability filters, direct Hugging Face inspection, and launch-ready variants help you move from browsing to running without context switching.

Runtime Lab

The knobs that actually matter.

Speculative decoding for speed, cache compression for memory. Real telemetry, not hand-waving.

Speculative decoding

MTPLX, DFlash, DDTree

Native MTP heads on Apple Silicon, draft-model DFlash, tree-aware DDTree. Backend auto-picks based on the model.

Tree budget 24 nodes

Estimated speedup 1.0x

Acceptance Baseline

Runtime note Native generation

Standard decoding keeps the simplest possible path. Turn on DFlash or DDTree when you want faster generation with native f16 cache.

KV cache compression

Estimate the memory win

Each strategy is a first-class runtime choice — not a buried config flag.

Context window 32K

Estimated cache 9.0 GB

Compression 1.0x

Best fit All runtimes

Native f16 keeps maximum fidelity. Compression backends appear in the app automatically once installed into the local runtime.

Feature spread

Broader than a launcher. Tighter than a notebook.

Local inference, agent tooling, image and video workflows, evaluation, and developer integrations — one surface.

Chat and prompting

Threads that remember their runtime.

Multi-thread chat with pinned sessions and per-thread model + runtime memory.
Document uploads with chunked retrieval and inline citations.
Vision attachments, reasoning mode, and collapsible thinking traces.
Tool calling: web search, calculator, file reader, code executor.
Prompt Library with searchable templates, tags, and one-click apply.

Models and runtimes

Discovery to serving without leaving the app.

Curated text, image, and video catalogs plus Hugging Face browsing.
Multiple model directories — custom stores, shared folders, ad-hoc paths.
Apple Silicon conversion from Hugging Face checkpoints to MLX with live progress.
Warm-pool runtime reuse for instant reactivation of recent profiles.
OpenAI-compatible local server with LAN mode, auto-start, and live request stats.

Image Studio

A full local image pipeline.

Image Discover for curated diffusion models and compatibility-aware installs.
Prompt-driven generation with aspect ratios, seeds, negatives, and quality presets.
Gallery with reruns, seed variation, and one-click reveal on disk.
Preload and unload controls so diffusion behaves like the text runtime.
Saved metadata so you can reproduce what actually worked.

Video Studio

Local text-to-video, same workspace.

Native MLX text-to-video on Apple Silicon for LTX-2 and Wan2.1 / Wan2.2.
One-click Wan install: downloads checkpoints and converts to MLX in the background.
Cross-platform fallback via diffusers and stable-diffusion.cpp on CUDA (HunyuanVideo, CogVideoX, Mochi).
Cache strategies — FBCache, TeaCache, TaylorSeer, MagCache, PAB, FasterCache — apply to video DiTs too.
Live per-step thumbnail previews so you can kill bad seeds early.

Performance and proof

Measure the tradeoff. Don't guess.

Throughput, perplexity, and task-accuracy benchmarks in the same UI.
Persistent history with chart and table views, delta tables, scatter plots.
DFlash and DDTree speculative decoding with draft-model auto-resolution.
Runtime controls for cache bits, FP16 layers, context, fused attention.
Live telemetry, backend health, logs, and orphaned-worker cleanup.

Extensibility

Built for people who wire tools together.

Plugin system for cache strategies, engines, tools, sources, and post-processors.
LoRA adapter discovery plus fine-tuning hooks for local workflows.
Remote provider configs for OpenAI-compatible APIs, stored locally with masked keys.
Integration snippets for Continue.dev, Goose, Cursor, and Claude Code via the local API.
Signed desktop releases with in-app updates across macOS, Linux, and Windows.

Drop-in local backend

Point your other tools at it.

The built-in OpenAI-compatible server turns the app into a local backend for editors, agents, and CLIs.

Continue.dev

{
  "provider": "openai",
  "model": "current-model",
  "apiBase": "http://127.0.0.1:8876/v1"
}

Cursor

Override OpenAI Base URL
http://127.0.0.1:8876/v1

Add your local model id.

Goose

GOOSE_PROVIDER=openai
GOOSE_MODEL=current-model
OPENAI_BASE_URL=http://127.0.0.1:8876/v1

Claude Code proxy

export ANTHROPIC_BASE_URL=http://127.0.0.1:8876/v1
export ANTHROPIC_AUTH_TOKEN=not-needed

Headless automation

chaosengine-cli drives the whole backend.

A typed CLI that reaches every backend route. Script chat, image, video, benchmarks, setup, and diagnostics.

Full-surface coverage

95 shortcuts plus a generic dispatcher.

95 typed shortcuts: load, prompt, bench, image, video, server, setup.
Generic call <METHOD> <PATH> reaches the remaining routes — 100% of the 125 endpoints.
Python stdlib only. No extra dependencies.
JSON to stdout — composes cleanly with jq and CI jobs.
Streaming and non-streaming modes with optional inline metrics.

Quick start

chaosengine-cli load mlx-community/Qwen3.6-35B-A3B-4bit --spec
chaosengine-cli prompt "Write a Rust quicksort" --stream --metrics
chaosengine-cli video "Aurora over a glacier, drone shot" --model Wan-AI/Wan2.1-T2V-1.3B

Download and build

Install fast. Or build from source.

Signed release builds for macOS, Linux, and Windows. Source stack: Tauri, React, TypeScript, Rust, Python.

macOS