Why does larger model loaded into GPU with MMAP option work very slow? #2020

alex-ie · 2026-03-06T19:06:25Z

alex-ie
Mar 6, 2026

I saw my memory usage went to ~98% during LLM model load to GPU (I had not much free RAM at that moment) and then "Terminated" in terminal. So I've added MMAP option in koboldcpp and it loaded, albeit longer. The delay was on "Model warm up" message in terminal. But it took like ~20 seconds to generate 1st token.

Smaller model that loaded without MMAP worked equally fast loaded with and without MMAP. With MMAP RAM usage for smaller model was lower all the way (after load too). For larger model memory usage was about same as for smaller (both when loaded with MMAP). Larger worked fine (fast) before - when free RAM was abundant.

Questions: 1) why larger model so slow if should be fully in VRAM after load? If not in VRAM, why? Can I fully put LLM into VRAM with koboldcpp (or if not with other tool)? 2) seems after load into GPU (Vulkan on NVIDIA) RAM taken to model load is not released. Why? 3) What does it mean in koboldcpp GUI MMAP help says "model will not be unloadable"? How to upload models if MMAP is not used?

LostRuins · 2026-03-07T00:43:12Z

LostRuins
Mar 7, 2026
Maintainer

Loading speed largely depends on your storage (e.g. SSD vs HDD).
yes you can do a full offload if you have enough VRAM. Check in task manager. If you share your GPU we can advise.
if you use mmap to load, the model will stay in memory until koboldcpp.exe closes. This may not be desired if you plan to swap model at runtime e.g. using the admin endpoint, but if you don't change model while using this is not an issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why does larger model loaded into GPU with MMAP option work very slow? #2020

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why does larger model loaded into GPU with MMAP option work very slow? #2020

Uh oh!

Uh oh!

alex-ie Mar 6, 2026

Replies: 1 comment

Uh oh!

LostRuins Mar 7, 2026 Maintainer

alex-ie
Mar 6, 2026

LostRuins
Mar 7, 2026
Maintainer