Conversation
…ntries, element_at) Closes apache#1448 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
element_at is already a Python-only alias for map_extract, so the Rust binding is unnecessary. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
make_map now takes a dict for the common case and also supports separate keys/values lists for column expressions. Non-Expr keys and values are automatically converted to literals. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
map() now supports three calling conventions matching upstream:
- map({"a": 1, "b": 2}) — from a Python dictionary
- map([keys], [values]) — two lists that get zipped
- map(k1, v1, k2, v2, ...) — variadic key-value pairs
Non-Expr keys and values are automatically converted to literals.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Add examples for all three map() calling conventions - Use clearer descriptions instead of jargon (no "zipped" or "variadic") - Break map_keys/map_values/map_extract/map_entries examples into two steps: create the map column first, then call the function Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Pull request overview
Exposes DataFusion’s map-related scalar functions to the Python API (wrapping new Rust bindings) and adds Python-level helpers plus unit tests.
Changes:
- Add Rust
pyo3bindings formake_mapand map accessors (map_keys,map_values,map_extract,map_entries). - Add Python wrappers (
map,make_map,map_keys,map_values,map_extract,map_entries,element_at) and export them via__all__. - Add unit tests covering the new map construction and accessor functions.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
crates/core/src/functions.rs |
Adds make_map binding and registers map accessor functions in the internal Rust module. |
python/datafusion/functions.py |
Adds Python-facing wrappers for map functions and updates exported symbol list. |
python/tests/test_functions.py |
Adds unit tests validating map creation and map accessor behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| result = ( | ||
| df.select(f.map_extract(m, literal("z")).alias("val")).collect()[0].column(0) | ||
| ) | ||
| assert result[0].as_py() == [None] |
There was a problem hiding this comment.
map_extract is documented as returning an “empty list if absent”, but the new test test_map_extract_missing_key asserts [None] for a missing key. These two behaviors are different (empty list vs list containing null). Please verify the actual DataFusion semantics and update either the docstring or the test expectation so they match.
| assert result[0].as_py() == [None] | |
| assert result[0].as_py() == [] |
| """Returns the value for the given key in the map, or an empty list if absent. | ||
|
|
||
| See Also: | ||
| This is an alias for :py:func:`map_extract`. | ||
| """ |
There was a problem hiding this comment.
The issue/PR context describes element_at as “access element in a map or array by key/index”, but this implementation is a direct alias for map_extract, which appears map-specific and returns a list of matches. If upstream DataFusion provides a dedicated element_at function (and/or supports array indexing), consider binding that directly so element_at matches upstream semantics rather than being limited to map_extract behavior.
| """Returns the value for the given key in the map, or an empty list if absent. | |
| See Also: | |
| This is an alias for :py:func:`map_extract`. | |
| """ | |
| """Returns the element for the given key/index in a map or array. | |
| On DataFusion versions that provide a dedicated ``element_at`` function, | |
| this will use that function and thus match upstream semantics (including | |
| support for both maps and arrays). On older versions, this falls back to | |
| :py:func:`map_extract`, which returns a list of matching values for maps. | |
| """ | |
| # Prefer the upstream `element_at` implementation when available so that | |
| # semantics (map/array support, scalar return) match DataFusion. Fall back | |
| # to the existing `map_extract`-based behavior for compatibility with | |
| # older backends that do not yet expose `element_at`. | |
| element_at_fn = getattr(f, "element_at", None) | |
| if element_at_fn is not None: | |
| return Expr(element_at_fn(map.expr, key.expr)) |
| elif ( | ||
| len(args) == 2 # noqa: PLR2004 | ||
| and isinstance(args[0], list) | ||
| and isinstance(args[1], list) | ||
| ): | ||
| key_list = args[0] | ||
| value_list = args[1] | ||
| elif len(args) >= 2 and len(args) % 2 == 0: # noqa: PLR2004 |
There was a problem hiding this comment.
When called as map([keys], [values]), the docstring states both lists must be the same length, but the implementation doesn’t validate len(key_list) == len(value_list). If lengths differ, the error will be deferred to the Rust/DataFusion layer (likely with a less actionable message). Consider adding an explicit length check here and raising ValueError with a clear message.
Which issue does this PR close?
Closes #1448
Rationale for this change
These functions were not exposed but are available upstream.
What changes are included in this PR?
Expose functions.
Add python wrappers.
Add unit tests.
Are there any user-facing changes?
New addition only.
Disclaimer
Most of this code was written with an AI agent, but every line has been reviewed by me and I am willing to discuss all design decisions.