You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a .versions file as the single source of truth for version
variables (Go, vLLM, vLLM upstream, SGLang, llama-server, vllm-metal,
diffusers, base image), replacing values scattered across
the Makefile, Dockerfile, CI workflows, and scripts.
Copy file name to clipboardExpand all lines: README.md
+8-5Lines changed: 8 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -157,6 +157,7 @@ MODEL_RUNNER_HOST=http://localhost:13434 ./model-cli list
157
157
## Using the Makefile
158
158
159
159
This project includes a Makefile to simplify common development tasks. Docker targets require Docker Desktop >= 4.41.0.
160
+
160
161
Run `make help`for a full list, but the key targets are:
161
162
162
163
- `build` - Build the Go application
@@ -194,6 +195,8 @@ This will:
194
195
- Start the service on port 8080 (or the specified port)
195
196
- All models downloaded will be stored in the host's `models` directory and will persist between container runs
196
197
198
+
> NOTE: The [`.versions`](.versions) file is the single source of truth for all version variables (Go, vLLM, SGLang, llama-server, etc.).
199
+
197
200
### llama.cpp integration
198
201
199
202
The Docker image includes the llama.cpp server binary from the `docker/docker-model-backend-llamacpp` image. You can specify the version of the image to use by setting the `LLAMA_SERVER_VERSION` variable. Additionally, you can configure the target OS, architecture, and acceleration type:
@@ -228,7 +231,7 @@ The Docker image also supports vLLM as an alternative inference backend.
228
231
To build a Docker image with vLLM support:
229
232
230
233
```sh
231
-
# Build with default settings (vLLM 0.12.0)
234
+
# Build with default settings (vLLM 0.17.0)
232
235
make docker-build DOCKER_TARGET=final-vllm BASE_IMAGE=nvidia/cuda:13.0.2-runtime-ubuntu24.04 LLAMA_SERVER_VARIANT=cuda
The vLLM variant supports the following build arguments:
247
250
248
-
- **VLLM_VERSION**: The vLLM version to install (default: `0.12.0`)
251
+
- **VLLM_VERSION**: The vLLM version to install (default: `0.17.0`)
249
252
- **VLLM_CUDA_VERSION**: The CUDA version suffix for the wheel (default: `cu130`)
250
253
- **VLLM_PYTHON_TAG**: The Python compatibility tag (default: `cp38-abi3`, compatible with Python 3.8+)
251
254
@@ -274,8 +277,8 @@ To update to a new vLLM version:
274
277
```sh
275
278
docker buildx build \
276
279
--target final-vllm \
277
-
--build-arg VLLM_VERSION=0.11.1 \
278
-
-t docker/model-runner:vllm-0.11.1 .
280
+
--build-arg VLLM_VERSION=0.17.0 \
281
+
-t docker/model-runner:vllm-0.17.0 .
279
282
```
280
283
281
284
The vLLM wheels are sourced from the official vLLM GitHub Releases at `https://github.com/vllm-project/vllm/releases`, which provides prebuilt wheels for each release version.
0 commit comments