Problem
np.maximum/minimum/clip with @out parameter does two passes:
np.copyto(@out, lhs) - copy input to output
ClipArrayMin(@out, min) - clip in-place
Root Cause
ClipArrayMin<T>(T* output, T* minArr, int size) operates in-place. No separate src parameter.
Fix
Add 3-operand kernel variants:
ClipArrayMinFromSource<T>(T* dest, T* src, T* minArr, int size)
// dest[i] = max(src[i], minArr[i]) - single pass
Same for ClipArrayMax and ClipArrayBounds.
Impact
Affects np.maximum, np.minimum, np.fmax, np.clip when @out is provided.
Problem
np.maximum/minimum/clipwith@outparameter does two passes:np.copyto(@out, lhs)- copy input to outputClipArrayMin(@out, min)- clip in-placeRoot Cause
ClipArrayMin<T>(T* output, T* minArr, int size)operates in-place. No separatesrcparameter.Fix
Add 3-operand kernel variants:
Same for
ClipArrayMaxandClipArrayBounds.Impact
Affects
np.maximum,np.minimum,np.fmax,np.clipwhen@outis provided.