Issues · lablup/mlxcel · GitHub

Labels Milestones

fix: externally quantized 4-bit Gemma 4 (OptiQ) decodes to degenerate output that is trapped entirely in reasoning_content

#467

· pjb7687 opened

on Jun 28, 2026

feat: autodetact and cleanup premature downloaded model

type:enhancement

#465

· pjb7687 opened

on Jun 28, 2026

fix: mlxcel-server does not understand `--draft-model`

#464

· pjb7687 opened

on Jun 28, 2026

fix: mlxcel serve panics when it comes to downloading a model

#463

· pjb7687 opened

on Jun 28, 2026

feat: evaluate a Rust-native StableHLO emitter as the compiler-family authoring path (spike)

area:architecture

priority:medium

type:enhancement

#451

· inureyes opened

on Jun 26, 2026

feat: OpenXLA reference backend - export-route spike through 4-bit quantized decode

area:architecture

priority:medium

type:enhancement

#449

· inureyes opened

on Jun 26, 2026

feat: distribute the mlxcel binary via pip so `pip install` yields a runnable managed mode

priority:medium

type:enhancement

#416

· inureyes opened

on Jun 24, 2026

fix(router): emit usage on the disaggregated /v1/chat/completions responses (streaming and non-streaming)

area:architecture

#398

· inureyes opened

on Jun 22, 2026

perf(core): adaptive selector for the native paged-attention decode kernel

priority:medium

type:performance

#331

· inureyes opened

on Jun 17, 2026

·

perf(moe): backend-aware fused-MoE Dff cap (CUDA crossover) and dispatch heuristic

priority:medium

type:performance

#330

· inureyes opened

on Jun 17, 2026

·

perf(nemotron-h): decode gap is MoE-block op-density (routed + shared expert), not SSM/attention

priority:medium

type:performance

#284

· inureyes opened

on Jun 14, 2026

·

feat: need a logo

priority:medium

type:enhancement

#59

· inureyes opened

on May 21, 2026

·