Skip to content

Data Loading Optimization guide#3875

Merged
divyanshk merged 14 commits intopytorch:mainfrom
divyanshk:data-loading-guide
May 8, 2026
Merged

Data Loading Optimization guide#3875
divyanshk merged 14 commits intopytorch:mainfrom
divyanshk:data-loading-guide

Conversation

@divyanshk
Copy link
Copy Markdown
Contributor

This is a new data loading tutorial covering best practices and common optimizations.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3875

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 2 Pending

As of commit 6bbcd4e with merge base 9e8a306 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@svekars
Copy link
Copy Markdown
Contributor

svekars commented May 7, 2026

@claude

@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @svekars's task in 1m 49s —— View job


Review: PR #3875 — Data Loading Optimization guide

  • Gather context and read tutorial file
  • Review tutorial content, code quality, and structure
  • Check against repo guidelines (CLAUDE.md)
  • Post review feedback

Overall this is a well-structured, useful tutorial with clear progressive benchmarking. Below are the issues I found, ranging from missing repo requirements to content/code issues.


Missing Repo Requirements

Per CLAUDE.md, new tutorials require:

  1. customcarditem entry in index.rst — not present
  2. toctree entry in index.rst — not present
  3. Thumbnail image in _static/img/thumbnails/cropped/ — not present

These are required for the tutorial to appear on the website. Fix this →


Content Issues

  1. Typo on line 805: "Prefetcing" → "Prefetching"

  2. Missing newline at end of file (line 839): The file ends without a trailing newline, which may cause lint failures.

  3. Numbering gap (line 737): The solutions section jumps from "3." directly to "5." — item 4 is missing.

  4. Line 31: "best practises" → "best practices" (American English is standard in PyTorch docs)

  5. Summary table values are hardcoded (lines 769–791): The table shows static numbers (~32s, ~12s, etc.) that won't match actual execution output. Consider noting these are approximate reference values from a specific hardware configuration, or removing the exact timings in favor of just the multipliers.

  6. Line 825: "Profile your pipeline with to identify..." — incomplete sentence, missing the tool name (e.g., "with PyTorch Profiler" or "with torch.profiler")


Code Issues

  1. DataPrefetcher has no CPU fallback (lines 485–528): When torch.cuda.is_available() is False, self.stream is None but preload() never moves data to any device. The data stays on CPU, which works but means the prefetcher adds overhead with no benefit. The guard at line 532 handles this at the call site, but the class itself could be clearer about this contract.

  2. benchmark_batch_size doesn't use non_blocking (line 289): Unlike the main training loop which uses non_blocking=True, this benchmark uses plain .to(device). This is fine for the batch size comparison (since you want to measure total time including transfer), but worth a brief note so readers don't think it's an oversight.

  3. No torch.cuda.synchronize() before start_time in train_and_benchmark (line 192): If there's prior GPU work from a previous benchmark, it could bleed into the timing. Adding a sync before the timer starts would make measurements more accurate.


Style / Minor

  1. Line 64 docstring: """A synthetic dataset that simulates expensive data transformations.""" — per repo guidelines, minimize comments. The class name and constructor parameters are self-explanatory. Same for line 84.

  2. Line 139 docstring: Similarly verbose for a benchmark utility class.

  3. Line 48: # Check if CUDA is available — comment restates the code.

  4. Lines 134-135: # Dataset for progressive optimization benchmarks. — the variable name already says this.


Structural Suggestion

The in_order and snapshot_every_n_steps sections (lines 606–640) are purely prose with no benchmarks. This is fine but feels like a slight shift from the tutorial's progressive-benchmark structure. Consider either:

  • Adding a brief benchmark for in_order=False, or
  • Moving these to the "Summary and Best Practices" section as bullet points

Summary

The tutorial is well-organized and covers an important topic. The main blockers are:

  • Missing index.rst entries and thumbnail (required by repo)
  • Typo/grammar fixes (lines 31, 737, 805, 825)
  • Missing trailing newline

The code is functionally correct and the progressive benchmarking approach makes the optimizations easy to follow.


Comment thread intermediate_source/intermediate_data_loading_tutorial.py
divyanshk and others added 12 commits May 7, 2026 14:41
…mples

- Generate synthetic data lazily to avoid large upfront memory allocation
- Add GPU transfer and synchronization to batch size benchmark
- Add in_order benchmark comparing batch timing variance
- Add snapshot_every_n_steps code example for stateful DataLoader
- Replace runtime py-spy check with static RST documentation note
- Fix parameter shadowing in create_optimized_dataloader
- Add proper timing to prefetcher demo with cuda.synchronize
- Add end-to-end training loop combining all optimizations
@divyanshk divyanshk force-pushed the data-loading-guide branch from 46e4ab7 to 7408887 Compare May 7, 2026 21:41
Copy link
Copy Markdown
Contributor

@svekars svekars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@divyanshk divyanshk merged commit e5afb46 into pytorch:main May 8, 2026
52 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants