Skip to content

add shared guard for setkey#7769

Draft
ben-schwen wants to merge 1 commit into
masterfrom
setkey_shallow_copy
Draft

add shared guard for setkey#7769
ben-schwen wants to merge 1 commit into
masterfrom
setkey_shallow_copy

Conversation

@ben-schwen
Copy link
Copy Markdown
Member

Closes #5230

@ben-schwen ben-schwen requested review from MichaelChirico and aitap and removed request for MichaelChirico May 29, 2026 20:01
@ben-schwen ben-schwen marked this pull request as draft May 29, 2026 20:07
@github-actions
Copy link
Copy Markdown

  • HEAD=setkey_shallow_copy much slower for as.data.table.array improved in #7019
    Comparison Plot

Generated via commit 1d17aaa

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 6 minutes and 27 seconds
Installing different package versions 12 minutes and 13 seconds
Running and plotting the test cases 5 minutes and 23 seconds

@aitap
Copy link
Copy Markdown
Member

aitap commented Jun 1, 2026

Part of the problem here is that it's too easy for data.table columns to accumulate a large reference count due to we use shallow copies:

x <- as.data.table(copy(mtcars))
.Internal(inspect(x, 1))
# @55bf23a15f20 19 VECSXP g0c7 [OBJ,REF(1),gp=0x20,ATT] (len=11, tl=1035)
#  @55bf262bdbf0 14 REALSXP g0c7 [REF(4)] (len=32, tl=0) 21,21,22.8,21.4,18.7,...
#   @55bf23cc0f60 14 REALSXP g0c7 [REF(4)] (len=32, tl=0) 6,6,4,6,8,...
#   @55bf23cc17d0 14 REALSXP g0c7 [REF(4)] (len=32, tl=0) 160,160,108,258,360,...
#   @55bf23d49d40 14 REALSXP g0c7 [REF(4)] (len=32, tl=0) 110,110,93,110,175,...
#   @55bf239f09a0 14 REALSXP g0c7 [REF(4)] (len=32, tl=0) 3.9,3.9,3.85,3.08,3.15,...
x[,mean(mpg),by=.(gear,carb)]
.Internal(inspect(x, 1))
# @55bf23a15f20 19 VECSXP g0c7 [OBJ,REF(3),gp=0x20,ATT] (len=11, tl=1035)
#   @55bf262bdbf0 14 REALSXP g0c7 [REF(6)] (len=32, tl=0) 21,21,22.8,21.4,18.7,...
#   @55bf23cc0f60 14 REALSXP g0c7 [REF(5)] (len=32, tl=0) 6,6,4,6,8,...
#   @55bf23cc17d0 14 REALSXP g0c7 [REF(5)] (len=32, tl=0) 160,160,108,258,360,...
#   @55bf23d49d40 14 REALSXP g0c7 [REF(5)] (len=32, tl=0) 110,110,93,110,175,...
#   @55bf239f09a0 14 REALSXP g0c7 [REF(5)] (len=32, tl=0) 3.9,3.9,3.85,3.08,3.15,...

So far, the guidance from above was to clean up our temporary shallow copies by hand in order to lower the reference counts on the columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DT1 row selection broken after DT2 = DT1[!b%in%x], setkey(DT2,a)

2 participants