v0.1.9
What's Changed
- Fix sink error for asm fmha by @LJ-underdog in #1652
- add guard in case pynccl init failed by @valarLip in #1671
- One shot pa by @fsx950223 in #1670
- fix(pa_ps): fix pa_ps_asm .co for gfx950 by @dbyoung18 in #1669
- modify test_bf16gemm_test by @amd-ruitang3 in #1678
- [FIX/CI] Fix ruff CI check by @Boss2002n in #1675
- fix mha bwd golden perf issue by @JaxChen29 in #1666
- topk uplift v1 by @steamedMantou in #1662
- fix missing return in mha_bwd by @yuguo68 in #1688
- Remove the input parameter "out" in gemm_a4w4 by @junhaha666 in #1679
- fwd v3 hd192 optimize inst alignment for causal mode by @shay-li77 in #1663
- fix swa case mismatch by @JaxChen29 in #1694
- fixing the fp4 gemm tune script Exception caused by csv title inconsistency with code by @hongxiayang in #1686
- CI: Migrate Triton tests to aiter-1gpu-runner by @gyohuangxin in #1690
- add ntile 128 for a8 blkQ moe 1 stage by @zufayu in #1695
- Optimize RoPE in the cases that hdim is small. by @ruanjm in #1698
- rm garbage from whl by @amd-ruitang3 in #1696
- enhance prebuild logic by @zufayu in #1672
- LLfp4 qr cap for atom by @amirumoAMD in #1673
- [MLA] MLA conditions rewrite by @Zzz9990 in #1665
- [MLA] fake non persistent fix dp causal by @Zzz9990 in #1677
- add two fp4 tune shapes and tuned config by @hongxiayang in #1687
- Dev/a8w4 and a8w8splitk by @yadaish in #1667
- bf16_gemm_clean_in_kl by @amd-ruitang3 in #1700
- fix tuner by @valarLip in #1701
- add gen_fake for 4 gemm operators by @mqhc2020 in #1456
- fix llvm issue by @valarLip in #1703
- feat: Adaptive topk algorithm selection based on input characteristics by @ClementLinCF in #1578
- fix mha bwd build error by @JaxChen29 in #1705
- fix moe bug when pipever=v1 and nblk=64 by @lalala-sh in #1707
- fix by @valarLip in #1710
New Contributors
- @JaxChen29 made their first contribution in #1666
- @amirumoAMD made their first contribution in #1673
Full Changelog: v0.1.8...v0.1.9