Skip to content

HIVE-27370: support 4 bytes characters#6340

Merged
okumin merged 20 commits intoapache:masterfrom
ryukobayashi:HIVE-27370
Mar 27, 2026
Merged

HIVE-27370: support 4 bytes characters#6340
okumin merged 20 commits intoapache:masterfrom
ryukobayashi:HIVE-27370

Conversation

@ryukobayashi
Copy link
Copy Markdown
Contributor

@ryukobayashi ryukobayashi commented Feb 27, 2026

What changes were proposed in this pull request?

If a SUBSTR UDF has a 4-byte characters in its parameter, the behavior is different between vectorized and non-vectorized. The vectorized version handles 4-byte characters properly, but the non-vectorized version does not, so similar logic is needed.
And these fixes use vectorized logic:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStartLen.java#L89-L130
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/StringSubstrColStart.java#L78-L109

Previous PR: #5624

Why are the changes needed?

Vectorized and non-vectorized have different results.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added pattern tests to itest for these to work correctly.

@ryukobayashi
Copy link
Copy Markdown
Contributor Author

@okumin, Last time, I didn't have time to work on it, so this PR(#5624) was automatically closed. So I created another one. The previous PR required extensive testing, but we have already verified it through our internal simulations and found no issues.

Co-authored-by: Shohei Okumiya <okumin@apache.org>
Copy link
Copy Markdown
Contributor

@okumin okumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@sonarqubecloud
Copy link
Copy Markdown

@okumin okumin merged commit bbd83df into apache:master Mar 27, 2026
2 checks passed
@okumin
Copy link
Copy Markdown
Contributor

okumin commented Mar 27, 2026

@ryukobayashi Thanks for submitting the patch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants