-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-26447: Vectorization: wrong results when filter on repeating map key orc table #3492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| <module>testutils</module> | ||
| <module>packaging</module> | ||
| <module>standalone-metastore</module> | ||
| <module>upgrade-acid</module> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the pom.xml from the PR
| final boolean isRepeating = keyColVector.isRepeating; | ||
| for (int i = 0; i < count; i++) { | ||
| final int keyOffset = offset + i; | ||
| final int len = isRepeating? keyLength[i]: keyLength[keyOffset]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that supposed to be [i] or [0]
The two rows look a bit wrong there.
I would write keyOffset = isRepeating ? 0 : offset + i
if the intention was to use =0 for all repeating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that the key is set isRepeating only when there is a single key (count=1), but I am not sure if different orc versions may have multiple keys, if there are multiple keys in repeating, keyLength may have multiple entries. Would it be better just deal with the isRepeating and count=1 separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Gopal. If isRepeating=true, then keyLength and keyVector will have to be looked at index = 0. I was also expecting something like below
final int len = isRepeating? keyLength[0]: keyLength[keyOffset];
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review @prasanthj ! updated. I only see isRepeating is true for one single key, but not sure if there can be multiple keys and still have isRepeating=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the test case with multiple keys and see if that produces expected result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added test cases of repeating multiple keys and mixed keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prasanthj appreciate your review!
|
+1 |
|
Test failures unrelated. Merging now. |
|
Hi @sunchao / @yigress - I think some of the test failures could be related to this patch. http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-3492/3/pipeline/439 (which is from the run on this PR) [2022-08-05T04:08:23.465Z] [ERROR] org.apache.hadoop.hive.cli.split16.TestMiniLlapLocalCliDriver.testCliDriver[orc_map_key_repeating] Time elapsed: 3.725 s <<< FAILURE! This PR directly added orc_map_key_repeating. |
|
I've attempted to fix the situation - #3509 |
|
@jfsii oops sorry, I'll take a look at the fix |
| byte[] rowKey = isRepeating? keyVector[0]: keyVector[keyOffset]; | ||
| if (StringExpr.equal(key, 0, key.length, | ||
| keyVector[keyOffset], keyStart[keyOffset], keyLength[keyOffset])) { | ||
| rowKey, keyStart[keyOffset], len)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyStart[keyOffset] should also be assigned based on isRepeating key.
… key orc table (apache#3492) (cherry picked from commit 4b1f01f)
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?