Local AI Lab Setup: macOS (Ollama)

This series treats your local LLM as a real service, which means the runtime must be stable before you measure anything. macOS is a great lab environment because you can install Ollama quickly and expose the local API without extra networking. The goal here is a clean setup and a verified endpoint, not a deep dive into platform quirks. Once the service is alive, everything else in the series becomes repeatable. Get the runtime running, then return to the main article.

macOS setup overview for running Ollama as a local LLM service — The macOS path is short: install the app, confirm the CLI, and verify the local endpoint.

Requirements

macOS 14 Sonoma or later.1
Disk space for local models (they add up quickly).

Install Ollama

Use the official macOS download so the app and CLI stay in sync.2 The app runs the local server for you, and the CLI makes it easy to pull models and verify the install. If you prefer Homebrew, the Ollama docs call out the supported paths so you can choose the one that fits your workflow. The point is not the installer; it is the service that comes out the other end. Install the app, then verify the CLI exists.

Flow diagram showing macOS install paths for Ollama via app or Homebrew — Choose the app or Homebrew, then make sure the CLI is on your PATH.

Start the local service

Launch the app once to start the local server, then confirm the CLI responds. The app runs in the background, so you only need to keep it open while you work. If you want a terminal-first workflow, you can run ollama serve manually, but the app is the fastest path. Once the server is running, the API is live. Now we verify it.

Pull a model and verify

Pull a small model so you can iterate quickly, then run it once to confirm inference works. A first run can be slower because weights are loading into memory, which is normal. The real check is whether the API responds with a streaming result. Use the curl request below to confirm the endpoint is alive.3 Once this works, you're done with setup.

curl http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "prompt": "Say hello to my little friend."
  }'

Terminal output showing Ollama version, model pull, and curl verification on macOS — A quick version check, a model pull, and a curl request confirm the service is alive.

Common issues

Connection refused: ensure the Ollama app is running, or start ollama serve.
Slow responses: start with a smaller model while you build your streaming harness.

Return to the main article for streaming clients and TTFT/throughput measurement:

AI on Your Computer: Run a Local LLM Like a Service

Sources

[1] Ollama installation requirements

[2] Ollama download page

[3] Ollama API reference