Skip to content

feat: recipe system — hardware-aware model recommendations#158

Open
raullenchai wants to merge 3 commits intomainfrom
feat/recipes
Open

feat: recipe system — hardware-aware model recommendations#158
raullenchai wants to merge 3 commits intomainfrom
feat/recipes

Conversation

@raullenchai
Copy link
Copy Markdown
Owner

Summary

  • Add rapid-mlx recipe <model> CLI command: auto-detects Apple Silicon hardware, computes max context length, estimated tok/s, and generates the optimal rapid-mlx serve command
  • Enhance rapid-mlx models with hardware-aware table showing fits/OOM/tight status per model
  • 40 Apple Silicon hardware profiles (M1→M4, all memory tiers) with auto-detection via sysctl
  • 16 model recipes from existing benchmark data (Qwen3.5, Qwen3-Coder, GLM, Phi-4, Mistral, Gemma, etc.)
  • Bandwidth-scaling estimation: estimated_tps = measured_tps × (target_bandwidth / reference_bandwidth)
  • JSON output mode for agent consumption (--json)
  • --run flag to start server directly with recommended config

Usage

rapid-mlx models                          # auto-detect hardware, show all models
rapid-mlx models --hardware m4-max-64     # specific hardware
rapid-mlx recipe qwen3.5-35b             # detailed recipe
rapid-mlx recipe qwen3.5-35b --json      # for agents
rapid-mlx recipe qwen3.5-35b --run       # start server directly

Test plan

  • 37 new tests (hardware profiles, model recipes, engine computation, CLI integration)
  • All 37 pass
  • 1279 existing tests pass (1 pre-existing failure in test_reasoning_parsers unrelated)
  • Lint clean (ruff check + format)
  • Tested models/recipe commands with auto-detect, explicit hardware, OOM cases, JSON output

🤖 Generated with Claude Code

Add `rapid-mlx recipe <model>` and enhance `rapid-mlx models` with
auto-detected Apple Silicon hardware profiles. Recipes compute max
context, estimated tok/s, and generate optimal serve commands from
16 benchmarked models × 40 hardware SKUs (M1→M4).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Your Name and others added 2 commits April 22, 2026 09:41
Qwen3.6-27B: 36.5 tok/s, 14.9GB, dense hybrid (DeltaNet), 262K context
Qwen3.6-35B: 92 tok/s, 19GB, MoE-hybrid, 12% faster than Qwen3.5-35B
Qwen3.6-35B-6bit: estimated 72 tok/s, 28GB, higher quality

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Qwen3.6 uses XML tool format (<function=name><parameter=key>value</parameter>),
not Hermes JSON format. With correct parser: 63% tool calling (vs 0% with hermes).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant