Same retain samples across ranks

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task
- [ ] My own task or dataset (give details below)

### Reproduction

Any official scripts.

### Expected behavior

During distributed training with multiple GPUs, all ranks produce identical retain loss values while forget loss values differ as expected. This indicates that all ranks are sampling the same retain data samples, which breaks the randomness assumption in the unlearning process.

The issue is located in `src/data/unlearn.py` in the `ForgetRetainDataset.__getitem__()` method:

```python
def __getitem__(self, idx):
    item = {}
    if self.anchor == "forget":
        item["forget"] = self.forget[idx]  # Sequential access - different across ranks ✓
        if self.retain:
            retain_idx = torch.randint(0, len(self.retain), (1,)).item()  # ❌ Problem here
            item["retain"] = self.retain[retain_idx]
```

**The problem**: `torch.randint()` uses the global PyTorch random number generator, which is synchronized across all ranks. This causes all ranks to generate the same sequence of random indices, leading to identical retain sample selection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same retain samples across ranks #139

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Same retain samples across ranks #139

Description

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions