Local AI

AUTOMATIC1111's Stable Diffusion web UI is the long-standing default interface for generating images locally with Stable Diffusion models. It runs in your browser against your own GPU, with a huge extension ecosystem and simpler controls than node-based tools like ComfyUI. Your prompts and images never leave your machine. Development has slowed, but it remains the easiest local image-generation on-ramp.

ComfyUI

ComfyUI is an open-source, node-based editor for running image-generation models like Stable Diffusion and Flux on your own GPU. You wire models, prompts, and processing steps into reusable workflows, which makes it the professional's tool for local image work. Prompts and outputs never leave your machine. The node interface has a real learning curve compared to simpler front ends.

Rung 0 · Default/ Compute Trusted third party

Family AI Chat

freemium

Family AI Chat is a closed-source mobile app offering push-to-talk voice conversations with an AI companion, where an adult configures allowed topics, voices, and languages. Speech is transcribed on the device and conversation text is kept locally, but the text itself goes to an unnamed third-party AI service for processing, so you cannot evaluate who actually handles it.

GPT4All

GPT4All is a free, open-source desktop application from Nomic that runs open-weight chat models on ordinary consumer hardware with essentially no setup. Install it, pick a model, and everything — prompts, documents, answers — stays on your machine, including a local-document chat feature. It will not match frontier cloud models for hard reasoning, but private everyday tasks cost nothing per use.

HomeLearnAI

Rung 2 · Custody/ Compute Hybrid

HomeLearnAI is an MIT-licensed Laravel application for homeschool management, with spaced-repetition review scheduling, multi-child support, calendar integration, and separate adult and learner interfaces. You can run the hosted version free or deploy it on your own server. The published documentation never states where its AI inference runs, so self-hosting the app does not confirm you are self-hosting the model behind it.

Hugging Face

freemium · open source

Hugging Face is the central registry for open-weight AI models — the place where models you can download and run on your own hardware are actually published. Every local tool on this list pulls from it directly or indirectly. It is a venture-backed platform, not sovereign infrastructure itself, but once the weights are on your disk, the platform's fate no longer affects you.

Rung 5 · Stack/ Compute Hybrid

Jan

Jan is an open-source desktop app that gives you a ChatGPT-style chat interface running entirely on your own computer, with no account and no telemetry. Download a model, start chatting; conversations never leave the machine. It suits people who want the familiar chat experience without a platform watching, and accepts the usual local-model tradeoff: less reasoning power than frontier cloud models.

llama.cpp

llama.cpp is the open-source C++ inference engine underneath much of the local-AI ecosystem, including Ollama and LM Studio. It runs open-weight models efficiently on ordinary CPUs and GPUs and gives you direct control over quantization, context size, and performance. It is a power tool: most people should start with the friendlier layers above it and drop down only when they need that control.

llamafile

llamafile, a Mozilla project, packs a language model and its runtime into a single executable file that runs on macOS, Windows, Linux, and the BSDs with no installation at all. Copy the file to any machine — even one with no internet — and you have working private AI. It is sovereignty by way of portability, better as a demo and utility than a daily driver.

Rung 4 · Operate/ Compute Hybrid

LM Studio

free

LM Studio is a polished desktop app for running open-weight language models locally, with a built-in model browser and one-click downloads from Hugging Face. It is the easiest path for non-developers: no terminal required, and your prompts stay on your machine. The app itself is proprietary, which is a real if modest tradeoff against fully open alternatives like Jan or Ollama.

LocalAI

LocalAI is an open-source, drop-in replacement for the OpenAI API that runs entirely on your own hardware. One self-hosted endpoint serves chat models, embeddings, speech, and image generation, so software written for cloud AI can point at your server instead with no code changes. It is aimed at people running a home server or homelab rather than a single desktop app.

Rung 0 · Default/ Compute Trusted third party

MaxHome AI

freemium

MaxHome AI is a closed-source household assistant app for iOS and Android with per-person profiles, voice commands, calendars, reminders, grocery lists, and budget tracking. It states conversations are never sold to advertisers or used to train models. As shipped it is a cloud service; a fully local install exists only through a paid in-home consultation, so the default configuration sends your household data off-device.

MLX

MLX is Apple's open-source machine-learning framework built specifically for M-series Macs, using unified memory to run open-weight models that would not fit in a typical graphics card's memory. For a Mac owner, it is the native fast path to serious local inference on hardware you already own. It is developer-oriented; non-coders will reach it indirectly through apps like LM Studio.

Ollama

Ollama is the simplest way to run open-weight language models on your own computer: one command to install, one to run a model. It exposes an OpenAI-compatible API on localhost, so tools built for cloud AI can work against your own hardware instead. Prompts never leave your machine. Local models trail the frontier in reasoning, so match them to the right jobs.

Open WebUI

Open WebUI is a self-hosted, ChatGPT-style web interface that sits on top of Ollama or any OpenAI-compatible endpoint. It adds the product layer local AI usually lacks — chat history, multiple users, document chat — while logging nothing to anyone else's servers. You run it yourself, typically in Docker, so it suits people one step past the beginner stage who want local AI to feel finished.

vLLM

vLLM is an open-source serving engine for running language models at high throughput on real GPU hardware, using techniques like PagedAttention and continuous batching to squeeze maximum performance from each card. It is the production tier of self-hosting: the right tool when you are serving a team, an app, or paying customers, and overkill for one person chatting on a laptop.