Add LTX2 Video VAE implementation

prishajain1 · csgoogle · commit 1c50bcf275aa · 2026-03-27T14:13:06.000+05:30
diff --git a/Gemini.md b/Gemini.md
@@ -0,0 +1,56 @@
+# Gemini Development Notes
+
+This file contains notes and lessons learned during development to avoid repeating mistakes.
+
+## Code Formatting
+
+- **Guideline**: Always format the code after making changes.
+- **Tools**:
+  - Use `make style` to format using `black` and `ruff`.
+- Use `./code_style.sh` (if configured to use the correct `pyink`) to format using `pyink`.
+- Use `make quality` to check formatting without modifying files.
+- **Manual Pyink**: To run `pyink` with specific settings (e.g. indentation=2, line-length=125):
+  ```bash
+  pyink src/maxdiffusion --check --diff --color --pyink-indentation=2 --line-length=125
+  ```
+- **Ruff Cleanup**: To remove unused imports using `ruff`:
+  ```bash
+  ruff check src/maxdiffusion/tests/wan_animate/test_transformer_wan_animate.py --fix
+  ```
+
+
+## Virtual Environment and Python Version
+
+- **Issue**: The virtual environment `.venv` uses a `python3.13` directory for dependencies, but `python3` invocations might pick up the system python 3.9 if activated incorrectly or if the `bin/python` symlink points to system python.
+- **Solution**: Always use the absolute path to the intended python version inside the virtual environment when running tests or scripts that depend on specific packages like `diffusers`.
+  - Example: `/Users/sagarchapara/repos/maxdiffusion/.venv/bin/python3.13`
+
+## Artifact Paths
+
+- **Issue**: Attempting to write artifacts (implementation plans, tasks, walkthroughs) to the codebase.
+- **Solution**: Always write artifacts to the artifact directory specific to the conversation.
+  - Path: `<appDataDir>/brain/<conversation-id>/`
+
+## Wan Animate Face Encoder
+
+### Shape Expectations
+
+- The test `test_wan_animate_face_encoder_shape` previously had an incorrect expectation `(2, 5, 5, 512)`.
+- The correct expectation for `WanAnimateFaceEncoder` output is `(2, 3, 5, 512)` (due to concatenation at the end adding 1 to the 3rd dimension after reshaping).
+
+### Weight Mapping
+
+- **Convolutions**: PyTorch 1D convolutional weights of shape `(out_channels, in_channels, kernel_size)` (e.g., `(4096, 512, 3)`) must be transposed to `(kernel_size, in_channels, out_channels)` (e.g., `(3, 512, 4096)`) using `transpose(2, 1, 0)` for `nnx.Conv`.
+- **Linears**: PyTorch linear weights `(out_features, in_features)` must be transposed to `(in_features, out_features)` for `nnx.Linear`.
+- **LayerNorm**: PyTorch typically does not use bias and scale in `WanAnimateFaceEncoder` (or they are fixed), which maps to `use_bias=False` and `use_scale=False` in JAX.
+
+## Wan Animate Face Block Cross Attention
+
+### Equivalence Test
+
+- Added `test_equivalence_face_block_cross_attention` to `src/maxdiffusion/tests/wan_animate/test_transformer_wan_animate.py` to verify equivalence between `FlaxWanAnimateFaceBlockCrossAttention` and `WanAnimateFaceBlockCrossAttention` in `diffusers`.
+- The test transfers weights and compares outputs for random inputs, asserting a tolerance of `1e-4`.
+
+### Temp Files Removal
+
+- Removed temporary inspection files created during the development process.