-
-
Notifications
You must be signed in to change notification settings - Fork 13
support #10
Description
official llama not support https://huggingface.co/tiiuae/Falcon3-10B-Instruct-1.58bit-GGUF? main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
system info: n_threads = 4, n_threads_batch = 4, total_threads = 8
system_info: n_threads = 4 (n_threads_batch = 4) / 8 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
init: using 8 threads for HTTP server
start: binding port with default address family
main: loading model
srv load_model: loading model 'C:\Users\admin\Downloads\falcon3.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
gguf_init_from_file_ptr: tensor 'blk.0.ffn_down.weight' of type 36 (TYPE_IQ4_NL_4_4 REMOVED, use IQ4_NL with runtime repacking) has 23040 elements per row, not a multiple of block size (0)
gguf_init_from_file_ptr: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from C:\Users\admin\Downloads\falcon3.gguf
llama_model_load_from_file_impl: failed to load model
llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model
llama_params_fit: fitting params to free memory took 0.04 seconds
gguf_init_from_file_ptr: tensor 'blk.0.ffn_down.weight' of type 36 (TYPE_IQ4_NL_4_4 REMOVED, use IQ4_NL with runtime repacking) has 23040 elements per row, not a multiple of block size (0)
gguf_init_from_file_ptr: failed to read tensor info
llama_model_load: error loading model: llama_model_loader: failed to load model from C:\Users\admin\Downloads\falcon3.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'C:\Users\admin\Downloads\falcon3.gguf'
srv load_model: failed to load model, 'C:\Users\admin\Downloads\falcon3.gguf'
srv operator (): operator (): cleaning up before exit...
main: exiting due to model loading error