On‑device AI inferencing SDKRun anything, anywhere

Build private, fast, and affordable AI at the edge

Kernl lets you deploy state‑of‑the‑art models on phones, browsers, and embedded devices with a single SDK. Offline by default. Cloud optional.

Trusted by privacy‑conscious teams and budget‑conscious developers

See Kernl in Action

Live simulation of AI running locally on your device — no cloud required

On‑device inference
AI model running entirely on your device - no data sent to cloud servers
NPU • 4‑bit • Cached
Using specialized AI chips and optimized 4-bit model weights with caching
tokens/s
22.0
~1.0
How fast the AI generates text - higher is better
p95 latency
28 ms
edge
Time to generate each word - lower is better (cloud is ~10x slower)
memory
512 MB
model+kv
RAM used by the AI model and conversation history
power
1.8 W
device
Energy consumption - much lower than cloud data centers
Latency comparison
On‑device~30 ms
Cloud~450 ms
On-device AI responds 10-15x faster than cloud AI because there's no network delay
main.swift
local • wasm-simd • npu
import Kernl
let kernl = Kernl(apiKey: "YOUR_LIFETIME_API_KEY")

let model = try kernl.loadModel(url: "tiny-llm.krnl")
for try token in model.generateStream(prompt: "Why edge?") {
  print(token, terminator: "")
}
Initialize with a lifetime API key — validated locally, works offline; Wi‑Fi only once every 30 days for automatic renewal. No developer action required.
output
Live text generation happening on your device - no internet required after initial setup

OpenAI's Stack vs Kernl: Unlimited AI for $0.30/user

Compare OpenAI's pay-per-use pricing (GPT-5 + Whisper + TTS) against Kernl's unlimited inference. Plus Kernl includes many more AI models like OCR, speaker diarization, image generation, video generation, embedding models, and more.

AI-Powered App: OpenAI's core AI stack

💬GPT-5 (LLM)
Cloud: $1.25/1M input + $10/1M output
$98/month
Daily: 1,000,000 input tokens
🎤Whisper (ASR)
Cloud: $0.006/minute
$180/month
Daily: 1,000 minutes
🔊GPT-4o-mini-TTS
Cloud: $0.015/minute
$225/month
Daily: 500 minutes

Total Monthly Cost

OpenAI APIs (3 services)$503
Pay-per-use pricing
Kernl (unlimited)$300
10+ AI models included

You Save

$0
per month
Annual savings:$2K
Cost reduction:40%

Why Kernl Wins

2x
cheaper than cloud
unlimited inference
0ms
network latency
100%
privacy
10+
AI models included

* Comparison based on OpenAI's official pricing: GPT-5 ($1.25/$10 per 1M tokens), Whisper ($0.006/min), GPT-4o-mini-TTS ($0.015/min). Kernl includes these plus OCR, speaker diarization, image generation, video generation, embeddings, and more for unlimited usage.

Product

One SDK for CPU, GPU, and NPUs across mobile, desktop, and web. Quantization, streaming I/O, and offline caching built‑in.

Run anywhere
iOS, Android, WebAssembly, Linux, macOS, Windows — same API, same models.
Private by default
No data leaves the device unless you say so. Perfect for regulated environments.
Cost‑efficient
Reduce cloud bills by moving inference to the edge with caching and quantization.
Fast startup
Binary weights, model prewarming, and lazy‑loading keep cold starts low.

Models

Bring your own ONNX, GGUF, or TorchScript models. Or choose from our curated library of small, fast models for chat, vision, and speech.

Language
LLMs, instruction‑tuned, small footprint
Vision
OCR, detection, segmentation
Speech
ASR and TTS with streaming

Languages & Platforms

One SDK for multi-language apps across every major architecture. Build once, run anywhere.

Languages
C++RustCGoPythonJavaJavaScriptTypeScriptSwift
Platforms & Architectures
Windows
  • ARM64
  • x86-64
macOS
  • Apple silicon (arm64)
  • Intel (x86-64)
iOS
  • arm64
Android
  • arm64
  • x86-64
Linux
  • arm64
  • x86-64

All combinations supported: Windows (ARM, x86), macOS (Apple silicon, Intel), iOS, Android, and Linux.

Docs

Install with one command. Initialize the runtime. Load a model. Run inference — on any device.

// JavaScript (Web, Node, Electron)
import { createRuntime, loadModel } from "@kernl/sdk";

const runtime = await createRuntime();
const model = await loadModel(runtime, {
  source: "gguf",
  url: "/models/tiny-llm.gguf",
});

const output = await model.generate({ prompt: "Hello, edge!" });
console.log(output.text);

Playground

Try models in your browser with on‑device inference.

Open playground

Pricing

Community
$0
  • MIT license
  • Local inference
  • Starter models
Get started
Pro
$29
  • Optimized runtimes
  • Edge caching
  • Priority updates
Start trial
Enterprise
Custom
  • SLA
  • On‑prem support
  • Custom operators
Contact sales

FAQ

Why on‑device?
For privacy, latency, and cost. Keep data local, reduce cloud round‑trips, and control spend.
Which devices are supported?
iOS, Android, WebAssembly, Linux, macOS, and Windows with CPU/GPU/NPU acceleration.
What model formats work?
Bring ONNX, GGUF, or TorchScript — or use our curated catalog.
Can I mix local and cloud?
Yes. Kernl supports hybrid flows with optional cloud fallbacks.