|
| 1 | +# Gemini Development Notes |
| 2 | + |
| 3 | +This file contains notes and lessons learned during development to avoid repeating mistakes. |
| 4 | + |
| 5 | +## Code Formatting |
| 6 | + |
| 7 | +- **Guideline**: Always format the code after making changes. |
| 8 | +- **Tools**: |
| 9 | + - Use `make style` to format using `black` and `ruff`. |
| 10 | +- Use `./code_style.sh` (if configured to use the correct `pyink`) to format using `pyink`. |
| 11 | +- Use `make quality` to check formatting without modifying files. |
| 12 | +- **Manual Pyink**: To run `pyink` with specific settings (e.g. indentation=2, line-length=125): |
| 13 | + ```bash |
| 14 | + pyink src/maxdiffusion --check --diff --color --pyink-indentation=2 --line-length=125 |
| 15 | + ``` |
| 16 | +- **Ruff Cleanup**: To remove unused imports using `ruff`: |
| 17 | + ```bash |
| 18 | + ruff check src/maxdiffusion/tests/wan_animate/test_transformer_wan_animate.py --fix |
| 19 | + ``` |
| 20 | + |
| 21 | + |
| 22 | +## Virtual Environment and Python Version |
| 23 | + |
| 24 | +- **Issue**: The virtual environment `.venv` uses a `python3.13` directory for dependencies, but `python3` invocations might pick up the system python 3.9 if activated incorrectly or if the `bin/python` symlink points to system python. |
| 25 | +- **Solution**: Always use the absolute path to the intended python version inside the virtual environment when running tests or scripts that depend on specific packages like `diffusers`. |
| 26 | + - Example: `/Users/sagarchapara/repos/maxdiffusion/.venv/bin/python3.13` |
| 27 | + |
| 28 | +## Artifact Paths |
| 29 | + |
| 30 | +- **Issue**: Attempting to write artifacts (implementation plans, tasks, walkthroughs) to the codebase. |
| 31 | +- **Solution**: Always write artifacts to the artifact directory specific to the conversation. |
| 32 | + - Path: `<appDataDir>/brain/<conversation-id>/` |
| 33 | + |
| 34 | +## Wan Animate Face Encoder |
| 35 | + |
| 36 | +### Shape Expectations |
| 37 | + |
| 38 | +- The test `test_wan_animate_face_encoder_shape` previously had an incorrect expectation `(2, 5, 5, 512)`. |
| 39 | +- The correct expectation for `WanAnimateFaceEncoder` output is `(2, 3, 5, 512)` (due to concatenation at the end adding 1 to the 3rd dimension after reshaping). |
| 40 | + |
| 41 | +### Weight Mapping |
| 42 | + |
| 43 | +- **Convolutions**: PyTorch 1D convolutional weights of shape `(out_channels, in_channels, kernel_size)` (e.g., `(4096, 512, 3)`) must be transposed to `(kernel_size, in_channels, out_channels)` (e.g., `(3, 512, 4096)`) using `transpose(2, 1, 0)` for `nnx.Conv`. |
| 44 | +- **Linears**: PyTorch linear weights `(out_features, in_features)` must be transposed to `(in_features, out_features)` for `nnx.Linear`. |
| 45 | +- **LayerNorm**: PyTorch typically does not use bias and scale in `WanAnimateFaceEncoder` (or they are fixed), which maps to `use_bias=False` and `use_scale=False` in JAX. |
| 46 | + |
| 47 | +## Wan Animate Face Block Cross Attention |
| 48 | + |
| 49 | +### Equivalence Test |
| 50 | + |
| 51 | +- Added `test_equivalence_face_block_cross_attention` to `src/maxdiffusion/tests/wan_animate/test_transformer_wan_animate.py` to verify equivalence between `FlaxWanAnimateFaceBlockCrossAttention` and `WanAnimateFaceBlockCrossAttention` in `diffusers`. |
| 52 | +- The test transfers weights and compares outputs for random inputs, asserting a tolerance of `1e-4`. |
| 53 | + |
| 54 | +### Temp Files Removal |
| 55 | + |
| 56 | +- Removed temporary inspection files created during the development process. |
0 commit comments