Eval bug: Performance degradation on ARM starting from b8057

### Name and Version

version: 8772 (bafae2765)                                                                       
built with Clang 21.1.8 for Linux aarch64

### Operating systems

Linux

### GGML backends

CPU

### Hardware

Dimensity 9000+ (X2+3xA710)

### Models

https://huggingface.co/ggml-org/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q4_0.gguf

### Problem description & steps to reproduce

I tried to test Qwen3 0.6B Q4 on my phone, and noticed severe performance degradation starting from 684b36101c9eeb7e89c9e602f9ded05f1353a0c6.

| model                          |       size |     params | backend    | threads | n_ubatch | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -: | --------------: | -------------------: |                                                    
|qwen3 0.6B Q4_0                | 403.42 MiB |   751.63 M | CPU        |       4 |       64 |  1 |           pp512 |        351.60 ± 0.00 |
| qwen3 0.6B Q4_0                | 403.42 MiB |   751.63 M | CPU        |       4 |       64 |  1 |           tg128 |         79.88 ± 0.00 |  

   build: 3a00c9858 (8056)

| model                          |       size |     params | backend    | threads | n_ubatch | fa |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -: | --------------: | -------------------: |                                                    
|qwen3 0.6B Q4_0                | 403.42 MiB |   751.63 M | CPU        |       4 |       64 |  1 |           pp512 |        161.30 ± 0.00 |    
 | qwen3 0.6B Q4_0                | 403.42 MiB |   751.63 M | CPU        |       4 |       64 |  1 |           tg128 |         80.05 ± 0.00 |      
          
   build: 684b36101 (8057)

| model                          |       size |     params | backend    | threads | n_ubatch | fa | mmap |            test |                  t/s |    
  | ------------------------------ | ---------: | ---------: | ---------- | ------: | -------: | -: | ---: | --------------: | -------------------: |                         
  | qwen3 0.6B Q4_0                | 403.42 MiB |   751.63 M | CPU        |       4 |       64 |  1 |    0 |           pp512 |        155.72 ± 0.00 |                                      
  | qwen3 0.6B Q4_0                | 403.42 MiB |   751.63 M | CPU        |       4 |       64 |  1 |    0 |           tg128 |         79.44 ± 0.00 |
                                                                                                                                            
 build: bafae2765 (8772)

### First Bad Commit

684b36101c9eeb7e89c9e602f9ded05f1353a0c6

### Relevant log output

bin/llama-bench -m /sdcard/Download/Qwen3-0.6B-Q4_0.gguf -fa 1 -ub 64 -r 1 -mmp 0 -t 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Performance degradation on ARM starting from b8057 #21854

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

model	size	params	backend	threads	n_ubatch	fa	test	t/s
qwen3 0.6B Q4_0	403.42 MiB	751.63 M	CPU	4	64	1	pp512	351.60 ± 0.00
qwen3 0.6B Q4_0	403.42 MiB	751.63 M	CPU	4	64	1	tg128	79.88 ± 0.00

Eval bug: Performance degradation on ARM starting from b8057 #21854

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions