feat[arrow-ord]: suppport REE comparisons#9621
Conversation
|
Unfortunately did not see there is an open PR #9448 before working on this, however I think this PR is more complete since #9448 does not seem to support REE with scalar or mixed comparisons. Also, this PR is more performant: |
79d81af to
eee179c
Compare
|
Super interesting. I suspect that the perf gain might go lower as # of phys indices grow as I'm avoiding copying them, but this definitely seems like a better implementation. |
| fn apply<T: ArrayOrd>( | ||
| op: Op, | ||
| l: T, | ||
| l_s: bool, |
There was a problem hiding this comment.
Unless I'm mistaken, you can have only one of l_s == true / l_v.is_some() / l_ree.is_some() at a time. How about introducing an enum? It will reduce the arg # and make them clearer
There was a problem hiding this comment.
I don't think that's right. Scalars can have ree and/or dictionary types and l_v and l_ree are both Some for Ree
There was a problem hiding this comment.
Ah, right! Still, I think that the too_many_arguments link is legit, and I'd consider groupin the args for each side -- either anonymous 3-tuple or a real struct. It would make the call site much cleaner. e.g.
struct ArrayInfo<'a> {
is_scalar: bool,
dict_values: &'a dyn AnyDictionyArray,
run_array: Option<&'a ReeInfo>,
}
85034d0 to
791ec8c
Compare
|
Thanks for the review @brunal. I addressed all your comments. |
| fn apply<T: ArrayOrd>( | ||
| op: Op, | ||
| l: T, | ||
| l_s: bool, |
There was a problem hiding this comment.
Ah, right! Still, I think that the too_many_arguments link is legit, and I'd consider groupin the args for each side -- either anonymous 3-tuple or a real struct. It would make the call site much cleaner. e.g.
struct ArrayInfo<'a> {
is_scalar: bool,
dict_values: &'a dyn AnyDictionyArray,
run_array: Option<&'a ReeInfo>,
}
| } | ||
| } | ||
|
|
||
| fn ree_physical_indices(info: &ReeInfo) -> Vec<usize> { |
There was a problem hiding this comment.
you can turn this into a .skip().map_while(...) to return an Iterator<Item = usize>. Then in logical_indices's Some-Some branch, you can directly map on it. This avoid materializing the intermediate physical indices vector.
This is very nitpicky, as the one branch that benefits from this, REE<dict>, seems like quite a niche use case :)
There was a problem hiding this comment.
I think I'm going to keep this as is currently written to mirror the dict normalized_keys materialization. We can do this as a follow up improvement if we see it's worth it.
791ec8c to
a3669d9
Compare
|
Updated, apologies for the delay. |
brunal
left a comment
There was a problem hiding this comment.
Looks excellent to me, now you just need someone with repo rights :-)
a3669d9 to
08e2e39
Compare
|
Friendly ping for a review or triage @alamb (not sure if there's another code owner) |
|
Thanks for the ping @asubiotto -- I'll try and find time to review this, but I am having a hard time finding time |
|
But nwo I see @brunal has already done it so that will make it easier |
|
Thanks! Appreciate the time. Yes, @brunal conducted an in-depth review so I only need a final look from a maintainer. |
|
run benchmarks comparison_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing asubiotto/reecmp (08e2e39) to ec771cc (merge-base) diff File an issue against this benchmark runner |
|
I think there is a bug in this code with sliced arrays. See this PR here for reproducer |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
|
Thanks! I will take a look and update you when it's fixed. |
08e2e39 to
1ad1b90
Compare
|
Fixed and added your test. We were incorrectly computing the expansion multiple for the comparison by using only the left offset as pos. We're now using a logical position (starting at 0 and incremented to the end of the segment i.e. the minimum run). |
|
FYI there's a p too many in the title: "suppport"! |
alamb
left a comment
There was a problem hiding this comment.
I think this PR looks good to me in general. I would like to see the REE code isolated a little more before merging to make understanding the comparison logic easier, but otherwise I think it is good to go
Thank you for bearing with me @asubiotto and @vegarsti
| ))); | ||
| } | ||
|
|
||
| let l_side = SideInfo { |
There was a problem hiding this comment.
if we are going to introduce a struct like this, perhaps as a follow on PR we could refactor all the argument to apply into SideInfo and then make apply a function of SideInfo
There was a problem hiding this comment.
Possibly but I'm not sure this will make the code clearer. As currently written, SideInfo can be constructed only once before type the type dispatch. I do think we could probably clean up this code as a whole though but would need to think more deeply about how. I can create an issue but I don't like doing so if there isn't a clear actionable item.
| { | ||
| // Both non-scalar with indirection. Pure REE-vs-REE uses segment-based | ||
| // bulk comparison; other combinations fall back to index vectors. | ||
| if let (Some(li), None, Some(ri), None) = (l_info.ree, l_info.dict, r_info.ree, r_info.dict) |
There was a problem hiding this comment.
Can we please refactor this into its own function as it is REE specific and I think makes it harder t read cmp
|
|
||
| /// Compare two REE arrays by walking both run_ends simultaneously, comparing | ||
| /// once per aligned segment and bulk-filling the result. | ||
| fn apply_op_segments<T: ArrayOrd>( |
There was a problem hiding this comment.
Can we please name this something with ree in the name so it is clearer it is ree specific?
1ad1b90 to
cfc2a0a
Compare
This commit implements native comparisons on REE-encoded arrays which are treated similarly to dictionary indirection. This commit implements REE to scalar comparisons by operating on the physical values only then bulk expanding the boolean result. REE-to-REE comparisons are also optimized by computing aligned physical value runs to minimize comparisons. Mixed cases (REE vs flat) materialize a logical index mapping similar to dictionaries. This commit also supports REE<Dict>. For comparison, here are the benchmark results with flat arrays as a reference on my local machine: ``` eq Int32 time: [14.955 µs 15.162 µs 15.396 µs] eq scalar Int32 time: [11.379 µs 11.418 µs 11.459 µs] ree_comparison/eq_ree_scalar(phys=64,log=65536) time: [453.31 ns 454.88 ns 456.43 ns] ree_comparison/eq_ree_scalar(phys=1024,log=65536) time: [4.1224 µs 4.1298 µs 4.1368 µs] ree_comparison/eq_ree_scalar(phys=32768,log=65536) time: [93.506 µs 94.085 µs 94.993 µs] ree_comparison/eq_ree_ree(phys=64,log=65536) time: [413.96 ns 414.82 ns 415.87 ns] ree_comparison/eq_ree_ree(phys=1024,log=65536) time: [4.1597 µs 4.1660 µs 4.1749 µs] ree_comparison/eq_ree_ree(phys=32768,log=65536) time: [128.74 µs 144.40 µs 161.53 µs] ``` As is expected, the more we take advantage of REE encoding, the faster the comparisons are. Signed-off-by: Alfonso Subiotto Marques <[email protected]>
|
Addressed your comments @alamb, this should be good to go |
cfc2a0a to
a0a7521
Compare
This commit implements native comparisons on REE-encoded arrays which are treated similarly to dictionary indirection.
This commit implements REE to scalar comparisons by operating on the physical values only then bulk expanding the boolean result.
REE-to-REE comparisons are also optimized by computing aligned physical value runs to minimize comparisons.
Mixed cases (REE vs flat) materialize a logical index mapping similar to dictionaries.
This commit also supports
REE<Dict>.For comparison, here are the benchmark results with flat arrays as a reference on my local machine:
As is expected, the more we take advantage of REE encoding, the faster the comparisons are.
Which issue does this PR close?
RunArray(Run Length Encoding (RLE) / Run End Encoding (REE) support) #3520Rationale for this change
Feature enhancement for comparisons
What changes are included in this PR?
Support for REE comparisons, including tests and benchmarks.
Are these changes tested?
Yes.
Are there any user-facing changes?
REE comparisons are now supported.