A hardware-aware guide to data structures for system software engineers.
-
Updated
Dec 17, 2025
A hardware-aware guide to data structures for system software engineers.
VS Code extension: Go struct layout, padding, reorder
High-performance limit order book engine with C++ core and Python SDK. Processes 20M+ msgs/sec with µs latency. Supports real crypto/equity data replay, spread/imbalance/impact analytics, and backtesting of VWAP, TWAP, POV, and market-making strategies with reproducible PnL and risk metrics.
Cache & In-Memory optimizations for Rust, revived from the slabs of Sumer.
CCProf: Lightweight Detection of Cache Conflicts
Freeze Claude Code's prompt prefix so DeepSeek's automatic cache always hits — alignment proxy + coalescing + keepalive, installable as a CC plugin. Measured 64% cheaper on real Claude Code traffic.
Contains implementations of cache-optimized and external memory algorithms.
Field level cache optimizations for Rust (no_std)
High-performance C++ optimization guide with lock-free data structures, SIMD, and memory optimization examples | 高性能 C++ 优化指南,包含无锁数据结构、SIMD 和内存优化示例
A dotnet tool for moving project files into the directories specified by the solution (.sln) file.
Cache-aware orchestration for LLM agents. Fork helpers that share cached prefixes, detect cache breaks, and cut token costs by 38%+.
GitHub Action & CLI to analyze binary memory layouts: detect padding, compare diffs, enforce budgets. Parses DWARF debug info for C/C++/Rust/Go.
Modular Spectrum of Pi: reference implementation of the Stride-6 engine. Unifies Chudnovsky's series with DSP polyphase decomposition in Z/6Z. Validated at 100M digits with 95% parallel efficiency. Features a Shared-Nothing architecture to bypass the memory wall and maximize cache alignment.
DeepSeek缓存优化器 v1.1 — Reasonix四支柱 + 语义压缩 (命中率+30%)
DeepSeek-native coding agent harness. 99.8% cache hit rate, 6 code-level gates, Kevix structured harness methodology. L0 verified. Reach Discord:https://discord.gg/GcNhAHPZu
Arhitectura Sistemelor de Calcul - UPB 2020
Comparison of parallel matrix multiplication methods using OpenMP, focusing on cache efficiency, runtime, and performance analysis with Intel VTune.
A .NET CLI that scans GitHub Actions workflows for cache misconfigurations, weak cache keys, and missed dependency-cache opportunities.
A highly optimized CPU-based Matrix Multiplication algorithm in C++. Exploring performance improvements using OpenMP, Cache Tiling, SIMD Vectorization, and Intel VTune Profiler. Our project for the course "Computer Architecture" in the university of Pisa
Matrix multiplication using cache memory optimizations.
Add a description, image, and links to the cache-optimization topic page so that developers can more easily learn about it.
To associate your repository with the cache-optimization topic, visit your repo's landing page and select "manage topics."