Commit a6cfdf7
fix: disable non-blocking tensor copies to MPS during model loading
When loading model weights with `device_map="mps"`, the `non_blocking=True`
parameter in `set_module_tensor_to_device` causes asynchronous CPU-to-MPS
copies. With mmap-backed safetensors the source CPU memory can be released
before the MPS copy completes, silently corrupting the destination weights.
The corruption is non-deterministic and dtype-dependent (float32 corrupts
weights but not biases; float16 corrupts biases but not weights), resulting
in extreme values (~1e37), LayerNorm overflow, and NaN outputs.
Force synchronous copies (`non_blocking=False`) when the target device is
MPS. Other devices (CUDA, CPU) continue to use non-blocking transfers.
Fixes #13227
Made-with: Cursor1 parent c02c17c commit a6cfdf7
1 file changed
Lines changed: 8 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
254 | 254 | | |
255 | 255 | | |
256 | 256 | | |
257 | | - | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
258 | 265 | | |
259 | 266 | | |
260 | 267 | | |
| |||
0 commit comments