Local AI Lab Setup: Docker (Ollama)
If you want Ollama available across machines on your network, Docker is the cleanest way to package the runtime. The container exposes the same local API on port 11434, so the rest of the series stays identical. You still pull models, stream output, and measure TTFT the same way. The only difference is where the runtime lives. This page gives you a quickstart and points you to the official Docker guide for the full matrix of options.1

CPU-only container
Start the container with a named volume so models persist across restarts.1
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
GPU options (NVIDIA, AMD, Vulkan)
If you need GPU acceleration, follow the Ollama Docker guide for the full setup matrix and supported tags.1 For NVIDIA GPUs, you must install the NVIDIA Container Toolkit before Docker can see the GPU.2 For AMD GPUs, the rocm tag is the official path, and Vulkan can be enabled with an environment flag.1 Use the guide for exact flags and Jetson-specific requirements, then return here to verify the endpoint. The goal is still the same: a local API that responds on port 11434.
Run a model inside the container
Run a model to verify inference works end-to-end.1
docker exec -it ollama ollama run llama3.2
Verify the HTTP endpoint
Confirm the local API responds on port 11434 before you move on.4
curl http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"prompt": "Say hello from a container."
}'
Try different models
Browse available models in the Ollama library.3
Next
Return to the main article for streaming clients and TTFT/throughput measurement: