Build private, fast, and affordable AI at the edge
Kernl lets you deploy state‑of‑the‑art models on phones, browsers, and embedded devices with a single SDK. Offline by default. Cloud optional.
See Kernl in Action
Live simulation of AI running locally on your device — no cloud required
import Kernl
let kernl = Kernl(apiKey: "YOUR_LIFETIME_API_KEY")
let model = try kernl.loadModel(url: "tiny-llm.krnl")
for try token in model.generateStream(prompt: "Why edge?") {
print(token, terminator: "")
}
OpenAI's Stack vs Kernl: Unlimited AI for $0.30/user
Compare OpenAI's pay-per-use pricing (GPT-5 + Whisper + TTS) against Kernl's unlimited inference. Plus Kernl includes many more AI models like OCR, speaker diarization, image generation, video generation, embedding models, and more.
AI-Powered App: OpenAI's core AI stack
Total Monthly Cost
You Save
Why Kernl Wins
* Comparison based on OpenAI's official pricing: GPT-5 ($1.25/$10 per 1M tokens), Whisper ($0.006/min), GPT-4o-mini-TTS ($0.015/min). Kernl includes these plus OCR, speaker diarization, image generation, video generation, embeddings, and more for unlimited usage.
Product
One SDK for CPU, GPU, and NPUs across mobile, desktop, and web. Quantization, streaming I/O, and offline caching built‑in.
Models
Bring your own ONNX, GGUF, or TorchScript models. Or choose from our curated library of small, fast models for chat, vision, and speech.
Languages & Platforms
One SDK for multi-language apps across every major architecture. Build once, run anywhere.
- ARM64
- x86-64
- Apple silicon (arm64)
- Intel (x86-64)
- arm64
- arm64
- x86-64
- arm64
- x86-64
All combinations supported: Windows (ARM, x86), macOS (Apple silicon, Intel), iOS, Android, and Linux.
Docs
Install with one command. Initialize the runtime. Load a model. Run inference — on any device.
// JavaScript (Web, Node, Electron) import { createRuntime, loadModel } from "@kernl/sdk"; const runtime = await createRuntime(); const model = await loadModel(runtime, { source: "gguf", url: "/models/tiny-llm.gguf", }); const output = await model.generate({ prompt: "Hello, edge!" }); console.log(output.text);
Playground
Try models in your browser with on‑device inference.
Pricing
FAQ
- Why on‑device?
- For privacy, latency, and cost. Keep data local, reduce cloud round‑trips, and control spend.
- Which devices are supported?
- iOS, Android, WebAssembly, Linux, macOS, and Windows with CPU/GPU/NPU acceleration.
- What model formats work?
- Bring ONNX, GGUF, or TorchScript — or use our curated catalog.
- Can I mix local and cloud?
- Yes. Kernl supports hybrid flows with optional cloud fallbacks.