From dd0a452a7992007b692276711406152bc4b7ac61 Mon Sep 17 00:00:00 2001 From: venom1204 Date: Fri, 29 May 2026 19:10:27 +0000 Subject: [PATCH 1/6] added clarification --- man/assign.Rd | 2 +- vignettes/datatable-reference-semantics.Rmd | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/man/assign.Rd b/man/assign.Rd index b0c038349a..eb7000a3c2 100644 --- a/man/assign.Rd +++ b/man/assign.Rd @@ -35,7 +35,7 @@ set(x, i = NULL, j, value) } \arguments{ \item{LHS}{ A character vector of column names (or numeric positions) or a variable that evaluates as such. If the column doesn't exist, it is added, \emph{by reference}. } -\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. } +\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. }\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. Note that a zero-length \code{RHS} (other than \code{NULL}) is an error, unless \code{by} is used, in which case it is treated as a no-op for that group. } \item{x}{ A \code{data.table}. Or, \code{set()} accepts \code{data.frame}, too. } \item{i}{ Optional. Indicates the rows on which the values must be updated. If not \code{NULL}, implies \emph{all rows}. Missing or zero values are ignored. The \code{:=} form is more powerful as it allows adding/updating columns by reference based on \emph{subsets} and \code{joins}. See \code{Details}. diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index 8b93085a13..91d9268d0d 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -232,6 +232,10 @@ head(flights) * We could have also provided `by` with a *character vector* as we saw in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette, e.g., `by = c("origin", "dest")`. +#### Note on zero-length RHS and `by` + +* If the `RHS` of an assignment results in a zero-length vector (e.g., `numeric(0)`), `data.table` will usually throw an error. However, when using `by`, a zero-length result for a specific group is treated as a no-op (the column remains unchanged for that group) and no error is thrown. This is intentional to allow functions that might return no data for certain subsets to complete without crashing the entire operation. + # ### e) Multiple columns and `:=` From 74e590de06287bfccad93f6ed2e628829b9b670b Mon Sep 17 00:00:00 2001 From: venom1204 Date: Fri, 29 May 2026 19:17:23 +0000 Subject: [PATCH 2/6] modified --- man/assign.Rd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/man/assign.Rd b/man/assign.Rd index eb7000a3c2..d13af2665d 100644 --- a/man/assign.Rd +++ b/man/assign.Rd @@ -35,7 +35,7 @@ set(x, i = NULL, j, value) } \arguments{ \item{LHS}{ A character vector of column names (or numeric positions) or a variable that evaluates as such. If the column doesn't exist, it is added, \emph{by reference}. } -\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. }\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. Note that a zero-length \code{RHS} (other than \code{NULL}) is an error, unless \code{by} is used, in which case it is treated as a no-op for that group. } +\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. Note that a zero-length \code{RHS} (other than \code{NULL}) is an error, unless \code{by} is used, in which case it is treated as a no-op for that group. } \item{x}{ A \code{data.table}. Or, \code{set()} accepts \code{data.frame}, too. } \item{i}{ Optional. Indicates the rows on which the values must be updated. If not \code{NULL}, implies \emph{all rows}. Missing or zero values are ignored. The \code{:=} form is more powerful as it allows adding/updating columns by reference based on \emph{subsets} and \code{joins}. See \code{Details}. From b141ec0b3cce8a4862def7c0076013f45f98ab56 Mon Sep 17 00:00:00 2001 From: venom1204 Date: Fri, 29 May 2026 20:55:11 +0000 Subject: [PATCH 3/6] .. --- man/assign.Rd | 1 + vignettes/datatable-reference-semantics.Rmd | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/man/assign.Rd b/man/assign.Rd index d13af2665d..aa9c075cd9 100644 --- a/man/assign.Rd +++ b/man/assign.Rd @@ -54,6 +54,7 @@ set(x, i = NULL, j, value) DT[i, colvector := val, with = FALSE] # OLD syntax. The contents of "colvector" in calling scope determine the column(s). DT[i, (colvector) := val] # same (NOW PREFERRED) shorthand syntax. The parens are enough to stop the LHS being a symbol; same as c(colvector). DT[i, colC := mean(colB), by = colA] # update (or add) column called "colC" by reference by group. A major feature of `:=`. + DT[, x := if (.N > 2) sum(v) else integer(0), by = g] # zero-length RHS is treated as a no-op for groups with <= 2 rows DT[,`:=`(new1 = sum(colB), new2 = sum(colC))] # Functional form DT[, let(new1 = sum(colB), new2 = sum(colC))] # New alias for functional form. } diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index 91d9268d0d..49d20ceb20 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -234,7 +234,9 @@ head(flights) #### Note on zero-length RHS and `by` -* If the `RHS` of an assignment results in a zero-length vector (e.g., `numeric(0)`), `data.table` will usually throw an error. However, when using `by`, a zero-length result for a specific group is treated as a no-op (the column remains unchanged for that group) and no error is thrown. This is intentional to allow functions that might return no data for certain subsets to complete without crashing the entire operation. +#### Note on zero-length RHS and `by` + +* If the `RHS` of a `:=` assignment evaluates to a zero-length vector, an error is normally raised. When `:=` is used with `by`, however, a zero-length result for a group is treated as a no-op for that group and no error is thrown. This allows grouped operations to continue even when some groups produce no result. # From 6e22c729e3503a3496f295ea24b4d11fc68630d9 Mon Sep 17 00:00:00 2001 From: venom1204 Date: Fri, 29 May 2026 21:39:23 +0000 Subject: [PATCH 4/6] added j and value --- man/assign.Rd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/man/assign.Rd b/man/assign.Rd index aa9c075cd9..80cd98420f 100644 --- a/man/assign.Rd +++ b/man/assign.Rd @@ -40,8 +40,8 @@ set(x, i = NULL, j, value) \item{i}{ Optional. Indicates the rows on which the values must be updated. If not \code{NULL}, implies \emph{all rows}. Missing or zero values are ignored. The \code{:=} form is more powerful as it allows adding/updating columns by reference based on \emph{subsets} and \code{joins}. See \code{Details}. In \code{set}, only integer type is allowed in \code{i} indicating which rows \code{value} should be assigned to. \code{NULL} represents all rows more efficiently than creating a vector such as \code{1:nrow(x)}. } -\item{j}{ Column name(s) (character) or number(s) (integer) to be assigned \code{value} when column(s) already exist, and only column name(s) if they are to be created. } -\item{value}{ A list of replacement values to assign by reference to \code{x[i, j]}. } +\item{j}{ Column name(s) (character) or number(s) (integer). For \code{set}, these specify the columns of \code{x} to be updated. } +\item{value}{ A list or vector of replacement values to be assigned by reference to \code{x[i, j]}. For \code{set}, if multiple columns are specified in \code{j}, \code{value} should be a list. } } \details{ \code{:=} is defined for use in \code{j} only. It \emph{adds} or \emph{updates} or \emph{removes} column(s) by reference. It makes no copies of any part of memory at all. Please read \href{../doc/datatable-reference-semantics.html}{\code{vignette("datatable-reference-semantics")}} and follow with examples. Some typical usages are: From c906d4e048027cca58a46e0b399827bab1d38411 Mon Sep 17 00:00:00 2001 From: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> Date: Mon, 1 Jun 2026 09:53:13 +0200 Subject: [PATCH 5/6] final touch ups Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> --- man/assign.Rd | 6 +++--- vignettes/datatable-reference-semantics.Rmd | 2 -- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/man/assign.Rd b/man/assign.Rd index 80cd98420f..cb74f4b121 100644 --- a/man/assign.Rd +++ b/man/assign.Rd @@ -35,13 +35,13 @@ set(x, i = NULL, j, value) } \arguments{ \item{LHS}{ A character vector of column names (or numeric positions) or a variable that evaluates as such. If the column doesn't exist, it is added, \emph{by reference}. } -\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. Note that a zero-length \code{RHS} (other than \code{NULL}) is an error, unless \code{by} is used, in which case it is treated as a no-op for that group. } +\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. Note that a zero-length \code{RHS} (other than \code{NULL}) is an error, unless \code{by} is used, in which case it is treated as a no-op for that group, leaving existing values in those rows unchanged. } \item{x}{ A \code{data.table}. Or, \code{set()} accepts \code{data.frame}, too. } \item{i}{ Optional. Indicates the rows on which the values must be updated. If not \code{NULL}, implies \emph{all rows}. Missing or zero values are ignored. The \code{:=} form is more powerful as it allows adding/updating columns by reference based on \emph{subsets} and \code{joins}. See \code{Details}. In \code{set}, only integer type is allowed in \code{i} indicating which rows \code{value} should be assigned to. \code{NULL} represents all rows more efficiently than creating a vector such as \code{1:nrow(x)}. } -\item{j}{ Column name(s) (character) or number(s) (integer). For \code{set}, these specify the columns of \code{x} to be updated. } -\item{value}{ A list or vector of replacement values to be assigned by reference to \code{x[i, j]}. For \code{set}, if multiple columns are specified in \code{j}, \code{value} should be a list. } +\item{j}{ Column name(s) (character) or number(s) (integer) to be assigned \code{value} when column(s) already exist, and only column name(s) if they are to be created. } +\item{value}{ A list or vector of replacement values to be assigned by reference to \code{x[i, j]}. } } \details{ \code{:=} is defined for use in \code{j} only. It \emph{adds} or \emph{updates} or \emph{removes} column(s) by reference. It makes no copies of any part of memory at all. Please read \href{../doc/datatable-reference-semantics.html}{\code{vignette("datatable-reference-semantics")}} and follow with examples. Some typical usages are: diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index 49d20ceb20..5353440e74 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -234,8 +234,6 @@ head(flights) #### Note on zero-length RHS and `by` -#### Note on zero-length RHS and `by` - * If the `RHS` of a `:=` assignment evaluates to a zero-length vector, an error is normally raised. When `:=` is used with `by`, however, a zero-length result for a group is treated as a no-op for that group and no error is thrown. This allows grouped operations to continue even when some groups produce no result. # From 5ebff1b6ade7caa77d69489f131e9cc79a8f195f Mon Sep 17 00:00:00 2001 From: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> Date: Mon, 1 Jun 2026 09:56:36 +0200 Subject: [PATCH 6/6] tweak column name --- man/assign.Rd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/man/assign.Rd b/man/assign.Rd index cb74f4b121..91c988cd8e 100644 --- a/man/assign.Rd +++ b/man/assign.Rd @@ -54,7 +54,7 @@ set(x, i = NULL, j, value) DT[i, colvector := val, with = FALSE] # OLD syntax. The contents of "colvector" in calling scope determine the column(s). DT[i, (colvector) := val] # same (NOW PREFERRED) shorthand syntax. The parens are enough to stop the LHS being a symbol; same as c(colvector). DT[i, colC := mean(colB), by = colA] # update (or add) column called "colC" by reference by group. A major feature of `:=`. - DT[, x := if (.N > 2) sum(v) else integer(0), by = g] # zero-length RHS is treated as a no-op for groups with <= 2 rows + DT[, colD := if (.N > 2) sum(v) else integer(0), by = g] # zero-length RHS is treated as a no-op for groups with <= 2 rows DT[,`:=`(new1 = sum(colB), new2 = sum(colC))] # Functional form DT[, let(new1 = sum(colB), new2 = sum(colC))] # New alias for functional form. }