Skip to content

feat: import user-defined physical optimizer rules over FFI#1557

Open
timsaucer wants to merge 12 commits into
apache:mainfrom
timsaucer:feat/df54-followups-wave4
Open

feat: import user-defined physical optimizer rules over FFI#1557
timsaucer wants to merge 12 commits into
apache:mainfrom
timsaucer:feat/df54-followups-wave4

Conversation

@timsaucer
Copy link
Copy Markdown
Member

@timsaucer timsaucer commented May 22, 2026

Which issue does this PR close?

Related to #1533.
Closes #1238

Rationale for this change

The upstream DataFusion project supports custom physical optimizer rules. These have not been available to datafusion-python. Now that we have FFI versions of these rules, expose them.

What changes are included in this PR?

  • Add a method to add a physical optimizer rule to a session context.
  • Add integration test.

Are there any user-facing changes?

New SessionContext.add_physical_optimizer_rule(rule) method. Additive, non-breaking.

timsaucer and others added 3 commits May 21, 2026 15:28
Expose `SessionContext.add_optimizer_rule` and
`SessionContext.add_analyzer_rule` symmetric with the existing
`remove_optimizer_rule`. Each accepts a Python subclass of the new
`datafusion.optimizer.OptimizerRule` / `AnalyzerRule` ABCs.

Implementation:

* New `crates/core/src/optimizer_rules.rs` wraps user Python instances
  in `PyOptimizerRuleAdapter` / `PyAnalyzerRuleAdapter`, which
  implement the upstream `OptimizerRule` / `AnalyzerRule` traits.
* `OptimizerRule.rewrite(plan)` returns `None` for "no change" or a
  new `LogicalPlan`. The adapter maps that to
  `Transformed::no` / `Transformed::yes` so the upstream optimizer's
  fixed-point loop terminates correctly.
* `AnalyzerRule.analyze(plan)` must always return a `LogicalPlan`;
  returning `None` surfaces a `DataFusionError::Execution` naming the
  offending rule.
* The upstream `&dyn OptimizerConfig` / `&ConfigOptions` arguments are
  not surfaced to Python in this MVP; rules that need configuration
  should capture it at construction time (for example by holding a
  `SessionContext` reference) or be implemented in Rust.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the Python-defined OptimizerRule/AnalyzerRule approach with FFI-imported physical optimizer rules.

The Python logical-rule approach could observe plans but not transform them: there are no Python constructors for LogicalPlan node variants, so a rule could only return None or the input plan unchanged. The audience for custom rules also overlaps strongly with people who can write Rust.

DataFusion exposes no FFI bridge for the logical OptimizerRule/AnalyzerRule traits, but it does export FFI_PhysicalOptimizerRule for the physical PhysicalOptimizerRule trait. This commit imports those instead.

Changes:

* Remove crates/core/src/optimizer_rules.rs, python/datafusion/optimizer.py, python/tests/test_optimizer.py, and the SessionContext.add_optimizer_rule / add_analyzer_rule methods. remove_optimizer_rule is unchanged (pre-existing).
* New crates/core/src/physical_optimizer.rs reads a __datafusion_physical_optimizer_rule__ capsule and converts it via Arc<dyn PhysicalOptimizerRule>::from(&FFI_PhysicalOptimizerRule).
* SessionContext gains a physical_optimizer_rules constructor argument. Upstream offers no API to add physical rules to a live context, so they are appended to the builder at construction time only.
* The datafusion-ffi-example crate gains MyPhysicalOptimizerRule, a counter-backed rule used by _test_physical_optimizer_rule.py to prove the rule fires over FFI during physical planning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the `list[Any]` hint on the SessionContext `physical_optimizer_rules` argument with a `PhysicalOptimizerRuleExportable` Protocol, matching the existing `TableProviderExportable` / `*Exportable` pattern used for other FFI-capsule objects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer changed the title feat: user-defined optimizer and analyzer rules feat: import user-defined physical optimizer rules over FFI May 27, 2026
timsaucer and others added 7 commits May 27, 2026 16:03
…string

Point the `physical_optimizer_rules` argument docs at the new
`PhysicalOptimizerRuleExportable` Protocol instead of describing the duck type inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The PyCapsule / FFI_PhysicalOptimizerRule mechanics describe the Protocol, not the SessionContext constructor. Move that detail onto PhysicalOptimizerRuleExportable and leave the constructor argument docs focused on behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the explanatory comment about FFI bridge availability; the same information already lives on PhysicalOptimizerRuleExportable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sibling FFI-import modules (udf, udaf, catalog, table) carry no module-level docs, and the rst-style markup did not match Rust conventions. The function doc comment already states intent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hand-written crates/core/src/physical_optimizer.rs with a `from_pycapsule!` invocation in the util crate, matching `physical_codec_from_pycapsule` and the other FFI capsule importers. The macro already handles the hasattr/getattr/cast/validate/pointer_checked sequence and the infallible `Arc::from(&FFI)` conversion, so the dedicated module is no longer needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the sentence about logical-rule FFI availability; it is background, not type-hint information, and keeps the Protocol docstring in line with the other *Exportable hints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer marked this pull request as ready for review May 27, 2026 20:32
@timsaucer timsaucer mentioned this pull request May 27, 2026
11 tasks
@timsaucer timsaucer marked this pull request as draft May 29, 2026 13:33
timsaucer and others added 2 commits May 29, 2026 09:45
Drop the `physical_optimizer_rules` constructor argument on
`SessionContext` and replace it with `add_physical_optimizer_rule`,
matching the existing `register_*` shape on the same class. The new
method rebuilds the session state via `SessionStateBuilder::new_from_existing`
so previously registered tables, UDFs, and catalogs are preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coverage subsumed by test_ffi_physical_optimizer_rule_runs_during_planning,
which exercises the same capsule export via add_physical_optimizer_rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer marked this pull request as ready for review May 29, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose physical plan optimizer

1 participant