Thanks for your interest! Here's how to get started.
# Clone and install in dev mode
git clone https://github.com/raullenchai/Rapid-MLX.git
cd Rapid-MLX
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
pip install pytest ruff # dev tools for testing and linting
# Start a dev server
rapid-mlx serve qwen3.5-4b --port 8000Requirements: Python 3.11+, macOS with Apple Silicon (M1/M2/M3/M4).
# Run all unit tests (no model needed)
python3 -m pytest tests/ -x -q
# Run a specific test file
python3 -m pytest tests/test_tool_calling.py -v
# Lint and format
ruff check .
ruff format --check .Most tests run without a model. Tests in tests/test_event_loop.py require a running server.
- Fork the repo and create a branch:
feat/,fix/,docs/,refactor/ - Make your changes with tests if applicable
- Run
ruff checkandruff formatbefore committing - Self-validate your PR (see below) — saves a round trip with maintainers
- Open a PR against
mainwith a clear description
Before opening (or after pushing fixes to) your PR, run our validation pipeline against it. The same script is what maintainers run before merging — running it yourself catches the easy stuff before review and signals you've done your homework.
python3 -m scripts.pr_validate.pr_validate <PR#>The script grades your PR through 7 steps and prints a strict markdown scorecard. Exit code 0 = MERGE-SAFE, exit code 1 = at least one step failed.
| step | what it does | when |
|---|---|---|
fetch |
pulls your PR + diff, classifies blast radius | always |
deepseek_review |
adversarial code review (skipped if no API key) | when DEEPSEEK_API_KEY is set and PR_VALIDATE_NO_DEEPSEEK is unset |
supply_chain |
flags new deps, install hooks, eval/exec/shell=True, hardcoded URLs |
always |
lint |
ruff check + ruff format --check |
when diff has .py |
targeted_tests |
runs tests touching the files you changed; negative-control filters pre-existing flakes | when diff has .py |
full_unit |
full pytest suite minus integrations | medium/high blast |
stress_e2e_bench |
boots a server, runs stress + agent integrations + bench vs baseline | high blast (engine/scheduler/memory_cache) |
You don't need every step to pass for a clean PR, but the more green checks you have, the faster review goes. In particular:
lintandtargeted_testsare non-negotiable — run these locally even without the full pipeline.supply_chainwarnings mean a maintainer will read your changes carefully (especially if you touchedsetup.py,.github/workflows/,Makefile, or added a new dep). That's not a problem — just be ready to explain the why.stress_e2e_benchrequires Apple Silicon + enough RAM to load a small model (≥6GB free). If you don't have the hardware, opt out withPR_VALIDATE_NO_STRESS=1— maintainers will run it for you on merge.deepseek_reviewneeds an API key — opt out withPR_VALIDATE_NO_DEEPSEEK=1if you don't have one. Maintainers will run it for you.
# Quick local check (no DeepSeek, no stress) — covers the "did I break anything obvious" case in <1 minute for most PRs:
PR_VALIDATE_NO_DEEPSEEK=1 PR_VALIDATE_NO_STRESS=1 \
python3 -m scripts.pr_validate.pr_validate <PR#>Full step list, gating logic, and how to add steps: scripts/pr_validate/README.md.
targeted_tests already handles this — it re-runs failures on your PR's base commit and reclassifies "fails on main too" as pre-existing (not a regression). For full_unit you'll currently see the failure surfaced; mention it in the PR comment ("test_X is failing on main too — see issue #123") and a maintainer will confirm.
It's still new. File an issue with [pr_validate] in the title and the artifacts under /tmp/pr_validate/pr-<N>/ attached.
-
Add a model alias — Add a short name to
vllm_mlx/aliases.jsonso users canrapid-mlx serve <alias>instead of typing a full HuggingFace path. See open model-support issues. -
Fix a
good first issue— Check the good first issue label.
-
Test a model and report results — Download a model, run benchmarks, report what works. Use the "Model Support Request" issue template.
-
Add parser auto-detection — Add a regex pattern to
vllm_mlx/model_auto_config.pyso a new model family gets the right tool/reasoning parser automatically. -
Classify a model into a SuffixDecoding tier — After adding a
ModelConfigentry, runpython3.12 scripts/bench_suffix_decoding_integrated.py --model <id>(10-20 min). Paste the resultingsuffix_decoding_tier=andsuffix_bench_speedup=into the entry. Reference the bench output in your PR. See docs/suffix_decoding_eligibility.md. -
Verify client integrations — Test Rapid-MLX with your favorite AI tool (Cursor, Continue, Aider, LangChain, etc.) and report results.
- Write a new tool call parser — Add support for a new tool call format in
vllm_mlx/tool_parsers/. - Performance optimization — Profiling, kernel improvements, caching strategies.
- BatchedEngine / continuous batching — Multi-user serving improvements.
The easiest contribution — no model download needed!
File: vllm_mlx/aliases.json
{
"my-model-7b": "mlx-community/My-Model-7B-Instruct-4bit"
}That's it. Find the MLX model on HuggingFace mlx-community and add the mapping. Convention: <family>-<size> in lowercase (e.g., qwen3.5-9b, gemma-4-26b).
When users serve a model without --tool-call-parser, Rapid-MLX auto-detects the right parser from the model name.
File: vllm_mlx/model_auto_config.py
# Add your pattern (order matters — more specific first):
(re.compile(r"my-model", re.IGNORECASE), ModelConfig(
tool_call_parser="hermes", # most common format
reasoning_parser=None, # set if model has thinking tags
)),Common tool parsers: hermes, llama, deepseek, gemma4, glm47, minimax, kimi.
Common reasoning parsers: qwen3, deepseek_r1, gemma4, minimax.
How to figure out the right parser: Check the model's chat template for tool call format. Most models use Hermes-style <tool_call> tags. If unsure, try hermes first.
- We use
rufffor linting and formatting - Type hints are encouraged but not required
- Keep changes focused — one feature/fix per PR
The release pipeline is fully automated from a single commit on main. Push a commit with subject chore: bump version to X.Y.Z (matching the new pyproject.toml version) and the rest happens on its own: tag → GitHub Release → PyPI → Homebrew formula PR.
If your PR adds a model alias, capability profile, or CLI flag, the version-check.yml workflow requires you to bump pyproject.toml in the same PR (or set the skip-version-bump label for pure refactors). This is the safety net that prevents stale rapid-mlx models after a merge.
Full details, escape hatches, and rationale: docs/development/releasing.md.