Skip to content

Add custom Qwen3 model with configurable attention and latentMoE.#3613

Merged
copybara-service[bot] merged 1 commit intomainfrom
test_896749554
Apr 17, 2026
Merged

Add custom Qwen3 model with configurable attention and latentMoE.#3613
copybara-service[bot] merged 1 commit intomainfrom
test_896749554

Conversation

@copybara-service
Copy link
Copy Markdown
Contributor

@copybara-service copybara-service bot commented Apr 8, 2026

Add custom Qwen3 model with configurable attention and latentMoE.

Specifically, this introduces:

  • attention_output_dim and moe_expert_input_dim to allow the attention
    block output and the MoE expert input to have different dimensionalities
    than the base embedding dimension.
  • A dense_init_scale config to allow configuring the initialization scale
    for dense layers across all models (replacing the hardcoded 1.0).

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 38.75000% with 49 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/models/qwen3_custom.py 37.68% 43 Missing ⚠️
src/maxtext/layers/moe.py 50.00% 4 Missing ⚠️
src/maxtext/layers/decoders.py 0.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@copybara-service copybara-service bot force-pushed the test_896749554 branch 3 times, most recently from 86b6433 to 4d67a7d Compare April 15, 2026 01:39
@copybara-service copybara-service bot force-pushed the test_896749554 branch 2 times, most recently from 8216eff to a1a0123 Compare April 15, 2026 20:56
@copybara-service copybara-service bot force-pushed the test_896749554 branch 2 times, most recently from b21b20a to f679f0a Compare April 15, 2026 22:05
@copybara-service copybara-service bot changed the title Custom Qwen 30B-A3B Add custom Qwen3 model with configurable attention and latentMoE. Apr 15, 2026
@copybara-service copybara-service bot force-pushed the test_896749554 branch 15 times, most recently from 9350e8a to 3cf64af Compare April 17, 2026 01:52
Specifically, this introduces:
* `attention_output_dim` and `moe_expert_input_dim` to allow the attention
  block output and the MoE expert input to have different dimensionalities
  than the base embedding dimension.
* A `dense_init_scale` config to allow configuring the initialization scale
  for dense layers across all models (replacing the hardcoded 1.0).

PiperOrigin-RevId: 901021328
@copybara-service copybara-service bot merged commit 7f78228 into main Apr 17, 2026
@copybara-service copybara-service bot deleted the test_896749554 branch April 17, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant