Data Loading Optimization guide by divyanshk · Pull Request #3875 · pytorch/tutorials

divyanshk · 2026-05-06T23:07:53Z

This is a new data loading tutorial covering best practices and common optimizations.

pytorch-bot · 2026-05-06T23:07:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3875

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 2 Pending

As of commit 6bbcd4e with merge base 9e8a306 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

svekars · 2026-05-07T19:04:41Z

@claude

claude · 2026-05-07T19:05:08Z

Claude finished @svekars's task in 1m 49s —— View job

Review: PR #3875 — Data Loading Optimization guide

Gather context and read tutorial file
Review tutorial content, code quality, and structure
Check against repo guidelines (CLAUDE.md)
Post review feedback

Overall this is a well-structured, useful tutorial with clear progressive benchmarking. Below are the issues I found, ranging from missing repo requirements to content/code issues.

Missing Repo Requirements

Per CLAUDE.md, new tutorials require:

customcarditem entry in index.rst — not present
toctree entry in index.rst — not present
Thumbnail image in _static/img/thumbnails/cropped/ — not present

These are required for the tutorial to appear on the website. Fix this →

Content Issues

Typo on line 805: "Prefetcing" → "Prefetching"
Missing newline at end of file (line 839): The file ends without a trailing newline, which may cause lint failures.
Numbering gap (line 737): The solutions section jumps from "3." directly to "5." — item 4 is missing.
Line 31: "best practises" → "best practices" (American English is standard in PyTorch docs)
Summary table values are hardcoded (lines 769–791): The table shows static numbers (~32s, ~12s, etc.) that won't match actual execution output. Consider noting these are approximate reference values from a specific hardware configuration, or removing the exact timings in favor of just the multipliers.
Line 825: "Profile your pipeline with to identify..." — incomplete sentence, missing the tool name (e.g., "with PyTorch Profiler" or "with torch.profiler")

Code Issues

DataPrefetcher has no CPU fallback (lines 485–528): When torch.cuda.is_available() is False, self.stream is None but preload() never moves data to any device. The data stays on CPU, which works but means the prefetcher adds overhead with no benefit. The guard at line 532 handles this at the call site, but the class itself could be clearer about this contract.
benchmark_batch_size doesn't use non_blocking (line 289): Unlike the main training loop which uses non_blocking=True, this benchmark uses plain .to(device). This is fine for the batch size comparison (since you want to measure total time including transfer), but worth a brief note so readers don't think it's an oversight.
No torch.cuda.synchronize() before start_time in train_and_benchmark (line 192): If there's prior GPU work from a previous benchmark, it could bleed into the timing. Adding a sync before the timer starts would make measurements more accurate.

Style / Minor

Line 64 docstring: """A synthetic dataset that simulates expensive data transformations.""" — per repo guidelines, minimize comments. The class name and constructor parameters are self-explanatory. Same for line 84.
Line 139 docstring: Similarly verbose for a benchmark utility class.
Line 48: # Check if CUDA is available — comment restates the code.
Lines 134-135: # Dataset for progressive optimization benchmarks. — the variable name already says this.

Structural Suggestion

The in_order and snapshot_every_n_steps sections (lines 606–640) are purely prose with no benchmarks. This is fine but feels like a slight shift from the tutorial's progressive-benchmark structure. Consider either:

Adding a brief benchmark for in_order=False, or
Moving these to the "Summary and Best Practices" section as bullet points

Summary

The tutorial is well-organized and covers an important topic. The main blockers are:

Missing index.rst entries and thumbnail (required by repo)
Typo/grammar fixes (lines 31, 737, 805, 825)
Missing trailing newline

The code is functionally correct and the progressive benchmarking approach makes the optimizations easy to follow.

…mples - Generate synthetic data lazily to avoid large upfront memory allocation - Add GPU transfer and synchronization to batch size benchmark - Add in_order benchmark comparing batch timing variance - Add snapshot_every_n_steps code example for stateful DataLoader - Replace runtime py-spy check with static RST documentation note - Fix parameter shadowing in create_optimized_dataloader - Add proper timing to prefetcher demo with cuda.synchronize - Add end-to-end training loop combining all optimizations

svekars

LGTM. Check preview: https://docs-preview.pytorch.org/pytorch/tutorials/3875/intermediate/intermediate_data_loading_tutorial.html

meta-cla Bot added the cla signed label May 6, 2026

divyanshk requested review from ramanishsingh and svekars May 6, 2026 23:12

divyanshk force-pushed the data-loading-guide branch from 83a15c2 to d74e331 Compare May 7, 2026 18:46

svekars added the new tutorial label May 7, 2026

svekars reviewed May 7, 2026

View reviewed changes

Comment thread intermediate_source/intermediate_data_loading_tutorial.py

divyanshk and others added 12 commits May 7, 2026 14:41

first commit

35eecbc

add a complete dl optimization example

e845afd

update tutorial format

54c3783

secound round of changes

15a0365

update authors list

5717097

minor tweaks

12bc5f9

another round of minor changes

b536bf5

remove draft .txt file

368bd69

rename file

306893b

fix typo

260123a

address comments

7408887

divyanshk force-pushed the data-loading-guide branch from 46e4ab7 to 7408887 Compare May 7, 2026 21:41

svekars approved these changes May 8, 2026

View reviewed changes

ramanishsingh approved these changes May 8, 2026

View reviewed changes

divyanshk and others added 2 commits May 8, 2026 09:47

Merge branch 'main' into data-loading-guide

166b898

Merge branch 'main' into data-loading-guide

6bbcd4e

divyanshk merged commit e5afb46 into pytorch:main May 8, 2026
52 of 53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Loading Optimization guide#3875

Data Loading Optimization guide#3875
divyanshk merged 14 commits intopytorch:mainfrom
divyanshk:data-loading-guide

divyanshk commented May 6, 2026

Uh oh!

pytorch-bot Bot commented May 6, 2026 •

edited

Loading

Uh oh!

svekars commented May 7, 2026

Uh oh!

claude Bot commented May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

svekars left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

divyanshk commented May 6, 2026

Uh oh!

pytorch-bot Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3875

⏳ No Failures, 2 Pending

Uh oh!

svekars commented May 7, 2026

Uh oh!

claude Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: PR #3875 — Data Loading Optimization guide

Missing Repo Requirements

Content Issues

Code Issues

Style / Minor

Structural Suggestion

Summary

Uh oh!

Uh oh!

svekars left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot Bot commented May 6, 2026 •

edited

Loading

claude Bot commented May 7, 2026 •

edited

Loading