Skip to content

feat: add optional JSON functions support#1466

Draft
crm26 wants to merge 1 commit intoapache:mainfrom
crm26:feat/json-functions
Draft

feat: add optional JSON functions support#1466
crm26 wants to merge 1 commit intoapache:mainfrom
crm26:feat/json-functions

Conversation

@crm26
Copy link
Copy Markdown

@crm26 crm26 commented Mar 30, 2026

Which issue does this PR close?

Closes N/A — new feature request.

Rationale

DataFusion Python users currently have no way to query JSON fields in SQL. The datafusion-functions-json crate (under datafusion-contrib) provides json_extract, json_get, ->, ->> and other JSON operators, but these are only available in Rust. This PR exposes them to Python users via an optional feature flag.

What changes are included in this PR?

  • Add datafusion-functions-json (v0.53) to workspace dependencies
  • Add optional dependency and json feature flag to core crate
  • Register JSON functions in SessionContext creation when feature is enabled

3 files changed, 11 insertions.

Are these changes tested?

Not yet — requesting feedback on approach before adding tests. Tests would verify:

  • json_extract_string(col, '$.path') works in SQL queries
  • Default build (no json feature) compiles and runs without regression
  • JSON functions are available immediately after SessionContext() creation

Are there any user-facing changes?

When built with --features json:

from datafusion import SessionContext

ctx = SessionContext()
# JSON extraction now available in SQL
result = ctx.sql("""
    SELECT json_extract_string(data, '$.address.state') as state,
           COUNT(*) as cnt
    FROM my_table
    GROUP BY state
""").collect()

Default builds are unaffected.

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Add `datafusion-functions-json` as an optional feature (`json`), giving
Python users `json_get_str`, `json_get`, `->`, `->>` and other JSON
operators in SQL queries.

When built with `--features json`, JSON functions are automatically
registered with every SessionContext. Default builds are unaffected.

Tested locally: json_get_str extracts values, nested paths work,
GROUP BY on extracted JSON fields works.

Changes:
- Add `datafusion-functions-json` to workspace dependencies
- Add optional dependency and `json` feature flag to core crate
- Register JSON functions in SessionContext creation when feature is enabled

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@crm26 crm26 force-pushed the feat/json-functions branch from 7bac57e to 4c7d253 Compare March 30, 2026 12:15
@timsaucer
Copy link
Copy Markdown
Member

Sounds like a very nice feature to have, but it would mean that we're now pulling in non-official code/functions into the official release. I don't know if that's a hard blocker, but I do want to bring the topic up on the mailing list for a wider audience before we merge this. I'm moving it to draft for that reason.

@timsaucer timsaucer marked this pull request as draft March 30, 2026 12:30
@timsaucer
Copy link
Copy Markdown
Member

Also for this PR to go in we would want first class dataframe API support and not just SQL support and unit tests to cover. Since you're using claude you might be able to use at least portions of the skill I've started working on #1460 to help write those pieces.

But first let's get a temperature read on the community. I'm 50/50 on the idea.

@crm26
Copy link
Copy Markdown
Author

crm26 commented Mar 30, 2026 via email

@timsaucer
Copy link
Copy Markdown
Member

Thanks Tim. I have use cases that need json support. I am seeing a material speed up using dataforge over duckdb with the unofficial library. Let me know how I can help. Thanks, Christian

That makes sense. The thing that we need to identify is if we want to include this in the core repository or simply expose a python library in the datafusion-contrib branch that you can install and register with one line.

@timsaucer
Copy link
Copy Markdown
Member


[features]
default = ["mimalloc"]
json = ["dep:datafusion-functions-json"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we should name this community_json so that its explicit that this from community contribution, not apache repo.

thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants