[Optimization] merge matmul and add#6986
Conversation
|
Thanks for your contribution! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #6986 +/- ##
==========================================
Coverage ? 73.99%
==========================================
Files ? 376
Lines ? 53406
Branches ? 8470
==========================================
Hits ? 39518
Misses ? 11138
Partials ? 2750
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/re-run all-failed |
2 similar comments
|
/re-run all-failed |
|
/re-run all-failed |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-03 17:42 CST
📋 Review 摘要
PR 概述:将 UnquantizedLinearMethod.apply() 中的 paddle.matmul + paddle.add 优化为 paddle.nn.functional.linear,减少算子调用开销。
变更范围:model_executor/layers/linear.py(核心优化)、测试 baseline 路径更新
影响面 Tag:OP Optimization
问题
未发现阻塞性问题。
✅ 代码分析
| 检查项 | 结果 |
|---|---|
| PR 标题规范 | ✅ 包含 [Optimization] Tag |
| PR 描述完整性 | ✅ Motivation/Modifications 已填写,附带性能对比图 |
| 代码逻辑正确性 | ✅ PaddlePaddle 的 F.linear(x, weight, bias) 计算公式为 x @ weight + bias,与原 matmul + add 实现等价 |
| 新增 assert 检查 | ✅ 验证 bias shape 与 weight 最后一维匹配,有助于提前发现配置错误 |
| 测试覆盖 | ✅ baseline 已更新(0402 → 0403),精度验证通过 |
总体评价
代码变更逻辑正确,性能优化合理。带 bias 情况使用 F.linear 可减少 Python 层调度开销;不带 bias 情况保持使用 matmul 避免小 shape 下的性能损失,符合 PR 描述中的性能分析结论。
* merge matmul and add * modify format * using paddle.nn.functional.linear * using _C_ops.linear * using paddle.nn.functional.linear * add FLAGS_use_legacy_linear env var in test case * fix format * add assert and remove env * modify format * using matmul for no bias * modify accurate baseline
Motivation
性能优化
Modifications
将UnquantizedLinearMethod中的matmul和add用linear替换。

带bias情况基本上有加速,不带bias情况小shape下性能有下降(主要是python层if等调度开销,linear内部实现也是matmul)。
Usage or Command
无
Accuracy Tests
精度保持一致
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.