Request: DFlash draft checkpoint for Qwen3.5-2B/0.8B

Currently, the only available DFlash draft model is Qwen3.5-4B-DFlash, which is paired with the Qwen3.5-4B target model. However, when deploying on consumer-grade GPUs (e.g., 2× 16GB), the Mamba cache required by the DFlash draft model consumes significant VRAM, making it difficult to use the recommended block_size=16 without running into OOM errors.A smaller DFlash draft checkpoint (e.g., trained from Qwen3.5-2B or Qwen3.5-0.8B) would be highly beneficial for memory-constrained deployments while still enabling speculative decoding acceleration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: DFlash draft checkpoint for Qwen3.5-2B/0.8B #100

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request: DFlash draft checkpoint for Qwen3.5-2B/0.8B #100

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions