feat: add optional JSON functions support#1466
Conversation
Add `datafusion-functions-json` as an optional feature (`json`), giving Python users `json_get_str`, `json_get`, `->`, `->>` and other JSON operators in SQL queries. When built with `--features json`, JSON functions are automatically registered with every SessionContext. Default builds are unaffected. Tested locally: json_get_str extracts values, nested paths work, GROUP BY on extracted JSON fields works. Changes: - Add `datafusion-functions-json` to workspace dependencies - Add optional dependency and `json` feature flag to core crate - Register JSON functions in SessionContext creation when feature is enabled Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7bac57e to
4c7d253
Compare
|
Sounds like a very nice feature to have, but it would mean that we're now pulling in non-official code/functions into the official release. I don't know if that's a hard blocker, but I do want to bring the topic up on the mailing list for a wider audience before we merge this. I'm moving it to draft for that reason. |
|
Also for this PR to go in we would want first class dataframe API support and not just SQL support and unit tests to cover. Since you're using claude you might be able to use at least portions of the skill I've started working on #1460 to help write those pieces. But first let's get a temperature read on the community. I'm 50/50 on the idea. |
|
Thanks Tim. I have use cases that need json support. I am seeing a material
speed up using dataforge over duckdb with the unofficial library. Let me
know how I can help.
Thanks,
Christian
…On Mon, Mar 30, 2026 at 8:32 AM Tim Saucer ***@***.***> wrote:
*timsaucer* left a comment (apache/datafusion-python#1466)
<#1466?email_source=notifications&email_token=AN334FEZ4F6OQHS7KAPXFA34TJSPBA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJVGQ3DSNJSGQ22M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2LK4DSL5RW63LNMVXHIX3POBSW4X3DNRUWG2Y#issuecomment-4154695245>
Also for this PR to go in we would want first class dataframe API support
and not just SQL support and unit tests to cover. Since you're using claude
you might be able to use at least portions of the skill I've started
working on #1460 <#1460>
to help write those pieces.
But first let's get a temperature read on the community. I'm 50/50 on the
idea.
—
Reply to this email directly, view it on GitHub
<#1466?email_source=notifications&email_token=AN334FEZ4F6OQHS7KAPXFA34TJSPBA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJVGQ3DSNJSGQ22M4TFMFZW63VGMF2XI2DPOKSWK5TFNZ2LK4DSL5RW63LNMVXHIX3POBSW4X3DNRUWG2Y#issuecomment-4154695245>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AN334FDWMPAR3GHPSLYMINL4TJSPBAVCNFSM6AAAAACXGBYRJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCNJUGY4TKMRUGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
That makes sense. The thing that we need to identify is if we want to include this in the core repository or simply expose a python library in the datafusion-contrib branch that you can install and register with one line. |
|
|
||
| [features] | ||
| default = ["mimalloc"] | ||
| json = ["dep:datafusion-functions-json"] |
There was a problem hiding this comment.
nit: maybe we should name this community_json so that its explicit that this from community contribution, not apache repo.
thoughts?
Which issue does this PR close?
Closes N/A — new feature request.
Rationale
DataFusion Python users currently have no way to query JSON fields in SQL. The
datafusion-functions-jsoncrate (underdatafusion-contrib) providesjson_extract,json_get,->,->>and other JSON operators, but these are only available in Rust. This PR exposes them to Python users via an optional feature flag.What changes are included in this PR?
datafusion-functions-json(v0.53) to workspace dependenciesjsonfeature flag to core crateSessionContextcreation when feature is enabled3 files changed, 11 insertions.
Are these changes tested?
Not yet — requesting feedback on approach before adding tests. Tests would verify:
json_extract_string(col, '$.path')works in SQL queriesjsonfeature) compiles and runs without regressionSessionContext()creationAre there any user-facing changes?
When built with
--features json:Default builds are unaffected.
🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com