Incorrect LLM output when using pipeline parallelism

### System Info

transformers==4.57.1
Python==3.12.12
Kaggle env

### Who can help?

@CyrilVallez
@3outeille 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I am developing a notebook that runs the [Molmo2](https://huggingface.co/allenai/Molmo2-8B) - action recognition and video understanding LLM model - on Kaggle. This setup will allow users with limited computational resources to run a demo on Kaggle's GPU for free. Kaggle provides an environment with 2 NVIDIA T4 GPUs. I have manually mapped the layers across each GPU to ensure that they fit within the VRAM constraints. However, I am experiencing extremely poor model performance, as it seems to operate as if the checkpoints were not loaded correctly.

On a single GPU or CPU, the model functions properly and produces expected results. Could someone please review my notebook and suggest a solution to this issue? Your help would be greatly appreciated.

Link to my [notebook](https://www.kaggle.com/code/cosmicwanderer2/action-recognition-with-molmo2).

What I have already tried:

- Used the `load_in_8bit` parameter, but when I called the generate function, I encountered a NotImplementedError, so I reverted back to using `torch.float16`.

- Couldn't use `torch.float32` because the T4 GPU does not have enough memory.

- Tried using the argument `device_map="auto"`, but the mapping was problematic, as half of a block stayed on one device while the other half ended up elsewhere. This is an issue when residuals are involved.

### Expected behavior

The model should say that there are penguins in the video. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect LLM output when using pipeline parallelism #44945

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Incorrect LLM output when using pipeline parallelism #44945

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions