Skip to content

Fix LongCat AudioDiT dtype, VAE, and attention API issues#13694

Open
hlky wants to merge 1 commit intohuggingface:mainfrom
hlky:codex/longcat-audio-dit-review
Open

Fix LongCat AudioDiT dtype, VAE, and attention API issues#13694
hlky wants to merge 1 commit intohuggingface:mainfrom
hlky:codex/longcat-audio-dit-review

Conversation

@hlky
Copy link
Copy Markdown
Contributor

@hlky hlky commented May 7, 2026

Summary

Fixes the verified longcat_audio_dit review findings from #13580.

  • use the transformer dtype for prompt embeddings, latents, latent conditioning, and normalized timesteps
  • remove the internal torch.no_grad() from encode_prompt
  • validate LongCat AudioDiT VAE downsampling_ratio against normalized strides and align defaults with [2, 4, 4, 8, 8]
  • wire AudioDiTAttention / LongCatAudioDiTTransformer into standard diffusers attention processor APIs
  • fix the slow real-checkpoint test to pass an instantiated tokenizer, not a tokenizer path
  • add regression coverage for dtype plumbing, gradient-capable prompt encoding, VAE ratio validation, and attention APIs

Closes #13580.

Tests

  • ruff check ...
  • git diff --check
  • make modified_only_fixup
  • make fix-copies
  • python -m pytest tests/pipelines/longcat_audio_dit/test_longcat_audio_dit.py tests/models/autoencoders/test_models_autoencoder_longcat_audio_dit.py tests/models/transformers/test_models_transformer_longcat_audio_dit.py -q
    • 67 passed, 17 skipped
  • python -m pytest --collect-only tests/pipelines/longcat_audio_dit/test_longcat_audio_dit.py::LongCatAudioDiTPipelineSlowTests::test_longcat_audio_pipeline_from_pretrained_real_local_weights -q

The real-checkpoint slow test was collected but not run because it requires LONGCAT_AUDIO_DIT_MODEL_PATH, LONGCAT_AUDIO_DIT_TOKENIZER_PATH, and an accelerator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

longcat_audio_dit model/pipeline review

1 participant