Skip to content

Add map functions#1461

Open
timsaucer wants to merge 7 commits intoapache:mainfrom
timsaucer:add-map-functions
Open

Add map functions#1461
timsaucer wants to merge 7 commits intoapache:mainfrom
timsaucer:add-map-functions

Conversation

@timsaucer
Copy link
Copy Markdown
Member

@timsaucer timsaucer commented Mar 29, 2026

Which issue does this PR close?

Closes #1448

Rationale for this change

These functions were not exposed but are available upstream.

What changes are included in this PR?

Expose functions.
Add python wrappers.
Add unit tests.

Are there any user-facing changes?

New addition only.

Disclaimer

Most of this code was written with an AI agent, but every line has been reviewed by me and I am willing to discuss all design decisions.

timsaucer and others added 7 commits March 29, 2026 12:52
…ntries, element_at)

Closes apache#1448

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
element_at is already a Python-only alias for map_extract,
so the Rust binding is unnecessary.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
make_map now takes a dict for the common case and also supports
separate keys/values lists for column expressions. Non-Expr keys
and values are automatically converted to literals.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
map() now supports three calling conventions matching upstream:
- map({"a": 1, "b": 2}) — from a Python dictionary
- map([keys], [values]) — two lists that get zipped
- map(k1, v1, k2, v2, ...) — variadic key-value pairs

Non-Expr keys and values are automatically converted to literals.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Add examples for all three map() calling conventions
- Use clearer descriptions instead of jargon (no "zipped" or "variadic")
- Break map_keys/map_values/map_extract/map_entries examples into
  two steps: create the map column first, then call the function

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@timsaucer timsaucer marked this pull request as ready for review March 29, 2026 23:03
@timsaucer timsaucer requested a review from Copilot March 30, 2026 16:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Exposes DataFusion’s map-related scalar functions to the Python API (wrapping new Rust bindings) and adds Python-level helpers plus unit tests.

Changes:

  • Add Rust pyo3 bindings for make_map and map accessors (map_keys, map_values, map_extract, map_entries).
  • Add Python wrappers (map, make_map, map_keys, map_values, map_extract, map_entries, element_at) and export them via __all__.
  • Add unit tests covering the new map construction and accessor functions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
crates/core/src/functions.rs Adds make_map binding and registers map accessor functions in the internal Rust module.
python/datafusion/functions.py Adds Python-facing wrappers for map functions and updates exported symbol list.
python/tests/test_functions.py Adds unit tests validating map creation and map accessor behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

result = (
df.select(f.map_extract(m, literal("z")).alias("val")).collect()[0].column(0)
)
assert result[0].as_py() == [None]
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map_extract is documented as returning an “empty list if absent”, but the new test test_map_extract_missing_key asserts [None] for a missing key. These two behaviors are different (empty list vs list containing null). Please verify the actual DataFusion semantics and update either the docstring or the test expectation so they match.

Suggested change
assert result[0].as_py() == [None]
assert result[0].as_py() == []

Copilot uses AI. Check for mistakes.
Comment on lines +3491 to +3495
"""Returns the value for the given key in the map, or an empty list if absent.

See Also:
This is an alias for :py:func:`map_extract`.
"""
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue/PR context describes element_at as “access element in a map or array by key/index”, but this implementation is a direct alias for map_extract, which appears map-specific and returns a list of matches. If upstream DataFusion provides a dedicated element_at function (and/or supports array indexing), consider binding that directly so element_at matches upstream semantics rather than being limited to map_extract behavior.

Suggested change
"""Returns the value for the given key in the map, or an empty list if absent.
See Also:
This is an alias for :py:func:`map_extract`.
"""
"""Returns the element for the given key/index in a map or array.
On DataFusion versions that provide a dedicated ``element_at`` function,
this will use that function and thus match upstream semantics (including
support for both maps and arrays). On older versions, this falls back to
:py:func:`map_extract`, which returns a list of matching values for maps.
"""
# Prefer the upstream `element_at` implementation when available so that
# semantics (map/array support, scalar return) match DataFusion. Fall back
# to the existing `map_extract`-based behavior for compatibility with
# older backends that do not yet expose `element_at`.
element_at_fn = getattr(f, "element_at", None)
if element_at_fn is not None:
return Expr(element_at_fn(map.expr, key.expr))

Copilot uses AI. Check for mistakes.
Comment on lines +3396 to +3403
elif (
len(args) == 2 # noqa: PLR2004
and isinstance(args[0], list)
and isinstance(args[1], list)
):
key_list = args[0]
value_list = args[1]
elif len(args) >= 2 and len(args) % 2 == 0: # noqa: PLR2004
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When called as map([keys], [values]), the docstring states both lists must be the same length, but the implementation doesn’t validate len(key_list) == len(value_list). If lengths differ, the error will be deferred to the Rust/DataFusion layer (likely with a less actionable message). Consider adding an explicit length check here and raising ValueError with a clear message.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add map functions (map, map_keys, map_values, map_extract, map_entries, element_at)

2 participants