feat: recipe system — hardware-aware model recommendations by raullenchai · Pull Request #158 · raullenchai/Rapid-MLX

raullenchai · 2026-04-22T16:16:09Z

Summary

Add rapid-mlx recipe <model> CLI command: auto-detects Apple Silicon hardware, computes max context length, estimated tok/s, and generates the optimal rapid-mlx serve command
Enhance rapid-mlx models with hardware-aware table showing fits/OOM/tight status per model
40 Apple Silicon hardware profiles (M1→M4, all memory tiers) with auto-detection via sysctl
16 model recipes from existing benchmark data (Qwen3.5, Qwen3-Coder, GLM, Phi-4, Mistral, Gemma, etc.)
Bandwidth-scaling estimation: estimated_tps = measured_tps × (target_bandwidth / reference_bandwidth)
JSON output mode for agent consumption (--json)
--run flag to start server directly with recommended config

Usage

rapid-mlx models                          # auto-detect hardware, show all models
rapid-mlx models --hardware m4-max-64     # specific hardware
rapid-mlx recipe qwen3.5-35b             # detailed recipe
rapid-mlx recipe qwen3.5-35b --json      # for agents
rapid-mlx recipe qwen3.5-35b --run       # start server directly

Test plan

37 new tests (hardware profiles, model recipes, engine computation, CLI integration)
All 37 pass
1279 existing tests pass (1 pre-existing failure in test_reasoning_parsers unrelated)
Lint clean (ruff check + format)
Tested models/recipe commands with auto-detect, explicit hardware, OOM cases, JSON output

🤖 Generated with Claude Code

Add `rapid-mlx recipe <model>` and enhance `rapid-mlx models` with auto-detected Apple Silicon hardware profiles. Recipes compute max context, estimated tok/s, and generate optimal serve commands from 16 benchmarked models × 40 hardware SKUs (M1→M4). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Qwen3.6-27B: 36.5 tok/s, 14.9GB, dense hybrid (DeltaNet), 262K context Qwen3.6-35B: 92 tok/s, 19GB, MoE-hybrid, 12% faster than Qwen3.5-35B Qwen3.6-35B-6bit: estimated 72 tok/s, 28GB, higher quality Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Qwen3.6 uses XML tool format (<function=name><parameter=key>value</parameter>), not Hermes JSON format. With correct parser: 63% tool calling (vs 0% with hermes). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

raullenchai mentioned this pull request Apr 22, 2026

feat: Qwen3.6 day 0 support (27B + 35B aliases) #159

Merged

7 tasks

Your Name and others added 2 commits April 22, 2026 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: recipe system — hardware-aware model recommendations#158

feat: recipe system — hardware-aware model recommendations#158
raullenchai wants to merge 3 commits intomainfrom
feat/recipes

raullenchai commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raullenchai commented Apr 22, 2026

Summary

Usage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant