data_read() can encrypt/decrypt R data files#675
Conversation
|
@gemini-code-assist review |
There was a problem hiding this comment.
Pull request overview
This PR introduces optional password-based encryption/decryption support to the data_write() / data_read() I/O wrappers, primarily targeting native R file formats.
Changes:
- Add a
passwordargument todata_write()anddata_read()and implement encrypt/decrypt helpers. - Add tests for encrypted
.rds/.rdataround-trips and ensure unsupported formats error whenpasswordis provided. - Update documentation, NEWS, and package metadata to reflect the new feature and dependency.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
R/data_write.R |
Adds password plumbing plus encryption helpers and format-specific guards. |
R/data_read.R |
Adds password plumbing plus decryption helpers for base R readers. |
tests/testthat/test-data_write.R |
Adds test coverage for encrypted R formats and unsupported encryption paths. |
man/data_read.Rd |
Documents the new password parameter for both read/write. |
NEWS.md |
Announces the new encryption/decryption capability and supported formats. |
DESCRIPTION |
Adds openssl to Suggests and bumps version. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
data_read() can encrypt/decrypt data silesdata_read() can encrypt/decrypt R data files
etiennebacher
left a comment
There was a problem hiding this comment.
I don't feel knowledgeable enough to review this PR. I've never had to interact with password-protected datafiles and I don't know the best practices in encryption/decryption.
Also, I don't see any feature request to handle encrypted files in haven or readr, so I think this is rare enough to let users explicitly handle it rather than doing some internal magic for them. I don't think the potential usability gain is worth the extra maintenance effort and risk.
|
I used this script to encrypt/decrypt data for a longer time now, and it's quite simple. I don't think the maintenance effort is very high. It's quite convenient, because data encryption is increasingly important (and awareness is raising). My colleagues use a password-protected zip-folder, which is of course annoying to handle. Same for loading and running the R script. I think a direct implementation is very practical. |
|
My concern is that this becomes the top result when people search for "R write password protected rda" while I'm not confident this follows best practices in security. For example, the current implementation uses:
My point is that there are various key components that we're just hiding from the user here and where we assume we make the right choice for them, but I'd rather have them being explicit about the functions they're using to secure data. The implementation is 4-5 lines for each so I don't think it's too hard for users (we could also add a line in docs to redirect them to openssl to encrypt/decrypt data). Curious about @easystats/core-team opinions here |
|
Ok, I see the point. So the minimum, if we decide on integrating such a feature, would be to use more up-to-date encryption standards? |
|
Yes but also give the user more choice to use the method they want. But my point was also that since the implementation is very simple, I think we'd just be better off documenting how to do this and redirect to |
|
After a little chat with Gemini, I learned that sha512 is the most secure hash key generation, and gcm-encryption is the most secure encryption method, so I updated the code here. I think this is in line with the help file from openssl.
But part of the easystats philosophy is to keep these confusing stuff away from users. Most of the people around me who work with R can use basics, maybe a bit of programming, but would not know how to encrypt/decrypt data (or other stuff). That's why I think it's convenient to have it here, and I don't think it will lead to "bad practices" in terms of "bad encryption" or security. Rather, if there's no easy-to-understand way how to encrypt files, it's often not done at all. So we would definitely have an improvement here for casual users, that's my impression. |
etiennebacher
left a comment
There was a problem hiding this comment.
Ok, that sounds reasonable to me. I suggested several changes to docs and tests below.
| #' @section Data encryption: | ||
| #' `data_read()` and `data_write()` support data encryption for R file formats | ||
| #' (`.rds`, `.rda` and `.rdata`). To encrypt a file, provide a password to the | ||
| #' `password` argument in `data_write()`. To decrypt the file, provide the same | ||
| #' password to `data_read()`. The encryption is based on the **openssl** package | ||
| #' and uses the AES-GCM algorithm (see `?openssl::aes_gcm_encrypt`) with a | ||
| #' 256-bit key (see `?openssl::sha256`). **Warning:** Do not lose your | ||
| #' `password`, else you will not be able to decrypt the data again! | ||
| #' |
There was a problem hiding this comment.
sha512 instead of sha256?
Also, this should mention that the decryption method only works when the file was encrypted with data_write() or in another function but still using openssl::sha512
There was a problem hiding this comment.
I first switched to sha512, but that doesn't seem to be supported by the openssl-functions. That's why I switched back to sha256:
passphrase <- charToRaw("This is super secret")
x <- serialize(iris, NULL)
key <- openssl::sha256(passphrase)
y <- openssl::aes_cbc_encrypt(x, key = key)
key <- openssl::sha512(passphrase)
y <- openssl::aes_cbc_encrypt(x, key = key)
#> Error in `aes_any()`:
#> ! key must be of length 16 (aes-128), 24 (aes-192) or 32 (aes-256)
y <- openssl::aes_ctr_encrypt(x, key = key)
#> Error in `aes_any()`:
#> ! key must be of length 16 (aes-128), 24 (aes-192) or 32 (aes-256)
y <- openssl::aes_gcm_encrypt(x, key = key)
#> Error in `aes_any()`:
#> ! key must be of length 16 (aes-128), 24 (aes-192) or 32 (aes-256)Created on 2026-04-16 with reprex v2.1.1
Co-authored-by: Etienne Bacher <[email protected]>
…er to open with openssl
|
Will continue working on docs/tests later |
|
Ok, I think all comments are addressed now. |
|
Thanks! Just waiting for CI to be green |
No description provided.