Nayab is the hosted demo of llmizeOFF — an open-source LLM runtime designed to run on VPS servers, cPanel hosting, Android apps, and local machines without subscriptions or external dependencies.
Try it free · 4 prompts without sign-up · unlimited after free account
The Runtime
llmizeOFF is the self-hosted inference engine behind Nayab. It is designed to be deployed anywhere — from a $5/month VPS to a cPanel shared host to an Android device — without cloud dependencies.
Run on any Linux VPS. Designed to work even on shared cPanel environments with limited resources.
Embed llmizeOFF in Android apps via JNI bindings. Run private AI on-device — no server needed.
Use as a local inference API for scripts, agents, and developer tooling. Zero cloud round-trips.
Tokens stream to the client as they are generated — first word appears in seconds, not after a full wait.
All inference happens on your machine. No conversation ever leaves your server. No telemetry.
Self-host once. Run forever. No monthly fees, no API quotas, no vendor lock-in of any kind.
Model recommendation
The self-hosted free tier runs Meta's Llama 3.2 1B (Q4_K_M) via llmizeOFF — a capable, lightweight open model that handles general questions, writing, and light coding while staying fast on CPU. It streams the first token in ~2-3 seconds. When you need answers in under a second, switch the model picker to the free Groq (Llama 3.1 8B) option. llmizeOFF can also drop in Gemma 2 2B (higher quality) or SmolLM2 (lighter) with one config change.
770 MB
Model size
~2-3 s
First token
Good
Quality
Support the project
The core runtime stays open-source and free. The upcoming paid edition adds a visual dashboard, one-click model management, multi-user support, Android SDK, and priority support — built for teams and indie devs who want a fully managed self-hosted AI stack.
Pre-orders and Ko-fi supporters shape what gets built first.
Free forever
Pro (coming soon)
How to support
4 free prompts. Then sign up free for unlimited access.