Skip to content
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions intermediate_source/transformer_building_blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@
# sequence lengths. They eliminate the need for the bug-prone practices of explicit
# padding and masking (think ``key_padding_mask`` in ``nn.MultiHeadAttention``).
#
# .. warning::
# Nested tensors are not currently under active development. Use at your own risk.
#
# * `scaled_dot_product_attention <https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html>`_
#
# ``scaled_dot_product_attention`` is a primitive for
Expand Down
4 changes: 2 additions & 2 deletions unstable_source/nestedtensor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
Getting Started with Nested Tensors
===============================================================

**Warning: Nested tensors are not currently under active development. Use at your own risk.**
Comment thread
svekars marked this conversation as resolved.
Outdated

Nested tensors generalize the shape of regular dense tensors, allowing for representation
of ragged-sized data.

Expand All @@ -21,8 +23,6 @@
they are invaluable for building transformers that can efficiently operate on ragged sequential
inputs. Below, we present an implementation of multi-head attention using nested tensors that,
combined usage of ``torch.compile``, out-performs operating naively on tensors with padding.

Nested tensors are currently a prototype feature and are subject to change.
"""

import numpy as np
Expand Down
Loading