-
Notifications
You must be signed in to change notification settings - Fork 174
Pull requests: waybarrios/vllm-mlx
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: report prompt_tokens correctly for LLM models in SimpleEngine
#236
opened Mar 30, 2026 by
sjswerdloff
Loading…
3 tasks
perf(reasoning): O(1) state-machine streaming parser (13-19x faster at 2k+ tokens)
#234
opened Mar 29, 2026 by
penumbraforge
Loading…
Add TurboQuant KV cache compression for prefix cache (4.6x)
#233
opened Mar 29, 2026 by
arozanov
Loading…
9 tasks done
fix: suppress tool call XML from streaming text content (#129)
#232
opened Mar 29, 2026 by
sjswerdloff
Loading…
fix: add missing return in load_model_with_fallback
#230
opened Mar 29, 2026 by
sjswerdloff
Loading…
fix: populate tokens field in BatchedEngine.generate()
#229
opened Mar 28, 2026 by
mmcaulif
Loading…
3 tasks done
fix: bump mlx-lm minimum to 0.31.0 for hybrid model batching
#227
opened Mar 25, 2026 by
krystophny
Loading…
test: make Python 3.13 async suite pass and cover it in CI
#226
opened Mar 25, 2026 by
krystophny
Loading…
feat: MTP per-request routing in BatchedEngine
#223
opened Mar 24, 2026 by
Thump604
Loading…
2 of 3 tasks
simple-engine: keep tool chat on the streaming execution path
#222
opened Mar 24, 2026 by
krystophny
Loading…
scheduler: preserve prompt checkpoints in chunked prefill resume path
#221
opened Mar 24, 2026 by
krystophny
Loading…
engine: keep SimpleEngine serialized across cancellation
#220
opened Mar 24, 2026 by
krystophny
Loading…
chat: forward chat_template_kwargs on simple-engine paths
#218
opened Mar 24, 2026 by
krystophny
Loading…
prefix_cache: preserve hybrid recurrent state across blocks
#217
opened Mar 24, 2026 by
krystophny
Loading…
server: add OpenAI-compatible /v1/responses endpoint
#214
opened Mar 24, 2026 by
krystophny
Loading…
feat: full sampling parameter support (top_k, min_p, presence_penalty, repetition_penalty)
#213
opened Mar 23, 2026 by
Thump604
Loading…
5 tasks done
fix: respect tool_choice="none" by excluding tools from template
#210
opened Mar 23, 2026 by
awanawana
Loading…
fix: Don’t truncate base64 images before hashing
#206
opened Mar 22, 2026 by
BelieveDiffusion
Loading…
feat: add lifecycle-managed residency for the default server model
#205
opened Mar 22, 2026 by
lyonsno
Loading…
fix: skip RNN snapshots in MTP optimistic mode to prevent memory leak
#196
opened Mar 21, 2026 by
Thump604
Loading…
4 tasks
fix: streaming detokenizer for UTF-8-safe incremental decode
#195
opened Mar 21, 2026 by
Thump604
Loading…
5 tasks
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.