Live demo · powered by llmizeOFF

Run your own AI.
No cloud required.

Nayab is the hosted demo of llmizeOFF — an open-source LLM runtime designed to run on VPS servers, cPanel hosting, Android apps, and local machines without subscriptions or external dependencies.

Try it free · 4 prompts without sign-up · unlimited after free account

Try Nayab Live llmizeOFF on GitHub

Self-hostedVPS & cPanel readyAndroid compatibleNo subscriptionsOffline-first

The Runtime

What is llmizeOFF?

llmizeOFF is the self-hosted inference engine behind Nayab. It is designed to be deployed anywhere — from a $5/month VPS to a cPanel shared host to an Android device — without cloud dependencies.

VPS & cPanel hosting

Run on any Linux VPS. Designed to work even on shared cPanel environments with limited resources.

Android & mobile apps

Embed llmizeOFF in Android apps via JNI bindings. Run private AI on-device — no server needed.

Local tools & CLI

Use as a local inference API for scripts, agents, and developer tooling. Zero cloud round-trips.

Real token streaming

Tokens stream to the client as they are generated — first word appears in seconds, not after a full wait.

Private by design

All inference happens on your machine. No conversation ever leaves your server. No telemetry.

No subscriptions

Self-host once. Run forever. No monthly fees, no API quotas, no vendor lock-in of any kind.

Model recommendation

Llama 3.2 1B · Balanced for a CPU VPS

The self-hosted free tier runs Meta's Llama 3.2 1B (Q4_K_M) via llmizeOFF — a capable, lightweight open model that handles general questions, writing, and light coding while staying fast on CPU. It streams the first token in ~2-3 seconds. When you need answers in under a second, switch the model picker to the free Groq (Llama 3.1 8B) option. llmizeOFF can also drop in Gemma 2 2B (higher quality) or SmolLM2 (lighter) with one config change.

770 MB

Model size

~2-3 s

First token

Good

Quality

Support the project

llmizeOFF Pro is coming

The core runtime stays open-source and free. The upcoming paid edition adds a visual dashboard, one-click model management, multi-user support, Android SDK, and priority support — built for teams and indie devs who want a fully managed self-hosted AI stack.

Pre-orders and Ko-fi supporters shape what gets built first.

Support on Ko-fi Star on GitHub

Free forever

Core runtime (open-source)
VPS self-hosting
GGUF model support
REST API

Pro (coming soon)

Visual dashboard
Model manager
Multi-user auth
Android SDK
Priority support

Run your own AI.No cloud required.