Local AI Lab Setup: Docker (Ollama)

If you want Ollama available across machines on your network, Docker is the cleanest way to package the runtime. The container exposes the same local API on port 11434, so the rest of the series stays identical. You still pull models, stream output, and measure TTFT the same way. The only difference is where the runtime lives. This page gives you a quickstart and points you to the official Docker guide for the full matrix of options.1

Docker setup overview for running Ollama as a local LLM service
Credit: MethodicalFunction.com.
Docker gives you a clean service setup with a simple server.

CPU-only container

Start the container with a named volume so models persist across restarts.1

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

GPU options (NVIDIA, AMD, Vulkan)

If you need GPU acceleration, follow the Ollama Docker guide for the full setup matrix and supported tags.1 For NVIDIA GPUs, you must install the NVIDIA Container Toolkit before Docker can see the GPU.2 For AMD GPUs, the rocm tag is the official path, and Vulkan can be enabled with an environment flag.1 Use the guide for exact flags and Jetson-specific requirements, then return here to verify the endpoint. The goal is still the same: a local API that responds on port 11434.

Run a model inside the container

Run a model to verify inference works end-to-end.1

docker exec -it ollama ollama run llama3.2

Verify the HTTP endpoint

Confirm the local API responds on port 11434 before you move on.4

curl http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "prompt": "Say hello from a container."
  }'

Try different models

Browse available models in the Ollama library.3

Next

Return to the main article for streaming clients and TTFT/throughput measurement:

Sources

[1] Ollama Docker reference

[2] NVIDIA Container Toolkit install guide

[3] Ollama model library

[4] Ollama API introduction