-
Notifications
You must be signed in to change notification settings - Fork 21
Issues
is:issue state:open
is:issue state:open
Issue creation is restricted in this repository
Search results
fix: externally quantized 4-bit Gemma 4 (OptiQ) decodes to degenerate output that is trapped entirely in reasoning_content
priority:highHigh priorityHigh prioritystatus:readyReady to be worked onReady to be worked ontype:bugBug fixes, error corrections, or issue resolutionsBug fixes, error corrections, or issue resolutionsStatus: Open.#467 In lablup/mlxcel;feat: autodetact and cleanup premature downloaded model
status:readyReady to be worked onReady to be worked ontype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additionsStatus: Open.#465 In lablup/mlxcel;fix: mlxcel-server does not understand
--draft-modelstatus:readyReady to be worked onReady to be worked ontype:bugBug fixes, error corrections, or issue resolutionsBug fixes, error corrections, or issue resolutionsStatus: Open.#464 In lablup/mlxcel;fix: mlxcel serve panics when it comes to downloading a model
status:readyReady to be worked onReady to be worked ontype:bugBug fixes, error corrections, or issue resolutionsBug fixes, error corrections, or issue resolutionsStatus: Open.#463 In lablup/mlxcel;feat: evaluate a Rust-native StableHLO emitter as the compiler-family authoring path (spike)
area:architectureArchitecture and code structure changesArchitecture and code structure changespriority:mediumMedium priorityMedium prioritystatus:readyReady to be worked onReady to be worked ontype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additionsStatus: Open.#451 In lablup/mlxcel;feat: OpenXLA reference backend - export-route spike through 4-bit quantized decode
area:architectureArchitecture and code structure changesArchitecture and code structure changespriority:mediumMedium priorityMedium prioritystatus:readyReady to be worked onReady to be worked ontype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additionsStatus: Open.#449 In lablup/mlxcel;feat: distribute the mlxcel binary via pip so
pip installyields a runnable managed modepriority:mediumMedium priorityMedium prioritystatus:readyReady to be worked onReady to be worked ontype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additionsStatus: Open.#416 In lablup/mlxcel;fix(router): emit usage on the disaggregated /v1/chat/completions responses (streaming and non-streaming)
area:architectureArchitecture and code structure changesArchitecture and code structure changespriority:lowLow priorityLow prioritystatus:backlogIn the backlog, not yet readyIn the backlog, not yet readytype:bugBug fixes, error corrections, or issue resolutionsBug fixes, error corrections, or issue resolutionsStatus: Open.#398 In lablup/mlxcel;perf(core): adaptive selector for the native paged-attention decode kernel
area:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layersarea:inferenceGeneration, sampling, decoding (incl. speculative, DRY)Generation, sampling, decoding (incl. speculative, DRY)priority:mediumMedium priorityMedium prioritytype:performancePerformance improvementsPerformance improvementsStatus: Open.perf(moe): backend-aware fused-MoE Dff cap (CUDA crossover) and dispatch heuristic
area:coremlxcel-core: MLX FFI, primitives, KV cache, layersmlxcel-core: MLX FFI, primitives, KV cache, layersarea:modelsModel architectures, weights, loading, metadataModel architectures, weights, loading, metadatapriority:mediumMedium priorityMedium prioritytype:performancePerformance improvementsPerformance improvementsStatus: Open.perf(nemotron-h): decode gap is MoE-block op-density (routed + shared expert), not SSM/attention
area:inferenceGeneration, sampling, decoding (incl. speculative, DRY)Generation, sampling, decoding (incl. speculative, DRY)area:modelsModel architectures, weights, loading, metadataModel architectures, weights, loading, metadataplatform:macosmacOS (Apple Silicon) specificmacOS (Apple Silicon) specificpriority:mediumMedium priorityMedium prioritytype:performancePerformance improvementsPerformance improvementsStatus: Open.feat: need a logo
area:docsUser and developer documentationUser and developer documentationhelp wantedExtra attention is neededExtra attention is neededpriority:mediumMedium priorityMedium prioritytype:enhancementNew features, capabilities, or significant additionsNew features, capabilities, or significant additionsStatus: Open.