Add DoFnRunner::finishKey() method#38454
Draft
arunpandianp wants to merge 8 commits into
Draft
Conversation
Moving the processTimers call from finish() to finishKey(). In upcoming changes there'll be multiple streaming work items in a single beam bundle. With multiple work items, we've to process elements and timers of each work item before moving to the next work items. finishKey() will be called by the NativeIterator classes after iterating through all elements from a work item. Batch processes timers in BatchModeUngroupingParDoFn and does not rely on the processTimers() in ParDoOperation::finish(). So removing the processTimers() call from ParDoOperation::finish() is safe. Batch also does not use the new finishKey() method.
In upcoming changes there'll be multiple dataflow streaming work items in a single beam bundle. With multiple work items, we've to process elements and timers of each work item before moving to the next work items. The new finishKey method allows the DoFnRunners to cleanup/persist state (that should not be carried over) before switching work items on multi key bundles. Streaming SideInputDoFnRunners are the only classes using the finishKey method right now. The finishKey() is not exposed to DoFns and is not visible in user apis.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In upcoming changes there'll be multiple dataflow streaming work items
in a single beam bundle. With multiple work items, we've to process elements
and timers of each work item before moving to the next work items.
The new finishKey method allows the DoFnRunners to
cleanup/persist state (that should not be carried over) before switching work items
on multi key bundles.
Dataflow streaming SideInputDoFnRunners are the only classes using the finishKey
method right now.
The finishKey() is not exposed to DoFns and is not visible in user apis.
This is on top of #38430