feat(logging): add spdlog backend behind ICEBERG_SPDLOG (5/6)#726
Open
kamcheungting-db wants to merge 12 commits into
Open
feat(logging): add spdlog backend behind ICEBERG_SPDLOG (5/6)#726kamcheungting-db wants to merge 12 commits into
kamcheungting-db wants to merge 12 commits into
Conversation
c6831cb to
2b80ee1
Compare
1136f53 to
bad2c20
Compare
Second block: the backend-agnostic logging API and the swappable default logger. - Logger: pure-virtual sink (ShouldLog/Log/SetLevel/level/Flush/IsNoop), with ShouldLog() as the single authority for runtime filtering and a documented no-reentrancy / thread-safety / noexcept contract. Mirrors MetricsReporter. - LogMessage owns its formatted text (moved in) so a sink may retain records; reserves an owning attributes vector for future structured logging without an ABI break. logger.h is backend-agnostic -- it never includes the build config header nor references the spdlog feature macro, so consumers see one stable API. - Process-global default logger: GetDefaultLogger / SetDefaultLogger / SetDefaultLevel, plus a lock-free thread-local generation cache on the hot path (no lock or refcount churn in steady state; slot mutex only on swap/refresh). NoopLogger is the placeholder default here; CerrLogger and the spdlog backend arrive in the following blocks. Builds the source into both CMake and Meson and installs logger.h alongside log_level.h; logger_test covers the default-logger API, level lowering taking effect immediately, and concurrent swap/read safety. Co-authored-by: Isaac
bad2c20 to
4c8d9a9
Compare
Resolves @wgtmac's review comments on PR apache#723: - Mark ShouldLog()/SetLevel()/level() noexcept so a custom logger cannot throw from the hot filter path; propagated to NoopLogger and CapturingLogger. - Make CurrentLogger() safe during thread teardown without UB: the per-thread cache is reached via a trivially-destructible thread_local pointer to a never-freed heap slot (kept LSan-reachable by a registry), so logging from a thread_local destructor in any order is safe. Replaces the AliveFlag guard. - Add ICEBERG_EXPORT to LogAttribute/LogMessage, matching Error/ScanReport. - Trim the logger.h \file brief (the macros it referenced arrive in PR apache#725). - Tests: use the IsError matcher and std::ignore; strengthen the teardown test to log from a thread_local destroyed after the cache. Co-authored-by: Isaac
Addresses @wgtmac's review comment on PR apache#723 asking for a std::format-style API that builds the string only after the level check. - A single overloaded iceberg::Log(): Log(level, "fmt {}", args...) -> process-default logger Log(logger, level, "fmt {}", args...) -> explicit logger disambiguated by the first argument. - Formats only after ShouldLog() (no wasted work when disabled) and never throws -- a formatting failure routes to EmitFormatError, matching the macros. - The call-site source_location is captured via a small consteval FmtWithLoc wrapper (the std::print technique), preserving compile-time format-string checking without a macro. The sink interface (Logger::Log(LogMessage&&)) intentionally stays pre-formatted: passing std::format_args across the virtual boundary is lifetime-unsafe and would break safe record retention. The format-style win is delivered at the call layer instead. Co-authored-by: Isaac
Addresses @wgtmac's review comment on PR apache#723 that LogMessage::attributes had no easy fill path. Adds a fluent LogMessage::Builder so engine loggers can attach structured fields (query id, task id, table name, snapshot id, file path, ...): auto record = LogMessage::Builder(LogLevel::kInfo) .Message("scan finished") .Attribute("table", table_name) .Attribute("snapshot_id", std::to_string(id)) .Build(); logger->Log(std::move(record)); - Message() sets the formatted text; Attribute() appends an owned key/value; Location() overrides the source location (defaults to the build site). - Build() moves the accumulated state into a LogMessage. This defines how attributes get populated, so structured logging can grow without an ABI break. Co-authored-by: Isaac
Replaces the intentional permanent per-thread leak with a cache that is freed at thread exit, while staying safe to call CurrentLogger() from any thread_local destructor during teardown. The per-thread state is split by lifetime: - `dead` (bool) and `cache` (pointer) are trivially destructible, so their storage stays valid for the whole thread and is readable from any other thread_local's destructor, in any destruction order. - A `guard` (the only piece with a destructor) frees the cache and sets `dead` at thread exit. A thread_local destroyed after it reads `dead == true` and falls back to an immortal no-op; one destroyed before it still sees a live cache. Either order is safe. This is the correct version of the earlier AliveFlag attempt, which was UB because the flag object itself had a destructor (reading it after that destructor ran was the very use-after-lifetime it tried to prevent). Drops the LSan-reachability registry, which is no longer needed now that nothing leaks. Validated with a dedicated stress harness under ThreadSanitizer and AddressSanitizer (no leaks): logging from a thread_local destroyed after the cache; logging interleaved with concurrent SetDefaultLogger swaps and thread teardown; and a thread whose only logging happens in a thread_local destructor. Adds ConcurrentTeardownAndSwapIsSafe to cover the combined path in CI. Co-authored-by: Isaac
Bind a logger for the current thread + scope so the default path (CurrentLogger/Log(level,...)/LOG_* macros) routes to it, with the global default as fallback. Engines can route Iceberg's own internal logs into a per catalog/session/query/task context without touching call sites.
Third block: the first concrete sink, and the process default until the spdlog backend lands. - CerrLogger writes to std::cerr with a fixed line layout `YYYY-MM-DDThh:mm:ss.mmmZ LEVEL [tid] file:line] message`. Timestamps use UTC std::chrono floored to milliseconds (no gmtime/localtime -- thread-unsafe); the thread id is the OS-native id, cached per thread. - Level is a std::atomic<LogLevel>; a mutex guards the whole formatted-line write so concurrent lines never interleave. Log()/Flush() wrap stream ops in try/catch so the noexcept contract holds even if the stream throws. - MakeDefaultLogger() now returns CerrLogger. Wired into both builds and installed; cerr_logger_test covers layout, level filtering, and concurrent-write safety. Co-authored-by: Isaac
Fourth block: the application-facing macros, the only part most callers touch.
- ICEBERG_LOG_{TRACE,DEBUG,INFO,WARN,ERROR,CRITICAL,FATAL} plus the generic
ICEBERG_LOG(level, ...), ICEBERG_LOG_TO(logger, level, ...) for an explicit
logger, and ICEBERG_LOG_RUNTIME_FMT for a runtime (non-literal) format string.
- ICEBERG_LOG_ACTIVE_LEVEL is a compile-time severity floor: statements below it
are removed entirely via `if constexpr` (no format call site, no source
location emitted). ICEBERG_LOG_FATAL is never gated by the floor -- its abort
is always compiled in; it emits, best-effort Flush()es the same logger it
emitted to, then std::abort().
- Filtering is decided solely by Logger::ShouldLog(); formatting is wrapped in
try/catch so logging never throws (a format failure routes to EmitFormatError).
- Bare Java-style aliases (LOG_INFO, ...) are opt-in via ICEBERG_LOG_SHORT_MACROS
to avoid polluting consumers / colliding with glog/abseil.
Header-only addition to logger.h. macros_test covers injection, the
guard-before-format short-circuit, never-throws, and FATAL aborts;
macros_active_level_test verifies compile-time stripping in a kOff translation unit.
Co-authored-by: Isaac
Fifth block: the default production backend and the build option that selects it. - SpdLogger wraps spdlog::logger (kCritical/kFatal -> spdlog critical, others 1:1), forwarding the pre-formatted message and source location. Synchronous only in v1 (spdlog's source_loc is a non-owning const char*, unsafe with async sinks). It lives in logging/internal/, is gated by #ifdef ICEBERG_HAS_SPDLOG, and is NOT installed -- consumers obtain it via the default logger or the registry, never by including spdlog headers. - New ICEBERG_SPDLOG CMake option (default ON). config.h is ALWAYS generated (only ICEBERG_HAS_SPDLOG's definedness varies) so logger.cc compiles in both configurations; MakeDefaultLogger() prefers SpdLogger when compiled in, else CerrLogger. - Critically, ICEBERG_SPDLOG=OFF now UNWIRES the previously-unconditional spdlog link (interface-lib lists + resolve_spdlog_dependency), not just the new source -- so an OFF build has no spdlog dependency at all. spdlog_logger_test (compiled only on the ON path) covers the level mapping including fatal->critical and source-location forwarding. Co-authored-by: Isaac
4c8d9a9 to
3d05bb5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part 5 of the logging stack (builds on #725). Adds the spdlog backend and the build option that selects it — the default in production builds.
What's here
SpdLoggerwrapsspdlog::logger(synchronous), maps levels (fatal/critical → spdlog critical; the abort stays in the macro layer), and forwards source location. Default sink is a colored stderr sink.ICEBERG_SPDLOGCMake option (ON by default). When OFF, the build has no spdlog dependency at all andCerrLoggeris the default.patternproperty is honored here viaspdlog set_pattern(the cerr backend keeps its fixed layout);levelworks on both backends.SpdLoggerlives inlogging/internal/and is not installed — apps get it via the default logger or the"spdlog"registry type.Tests —
spdlog_logger_test(compiled only when spdlog is ON): level mapping incl. fatal→critical, source-location forwarding, and thepatternproperty. clang/libc++ with spdlog 1.15.3.This pull request and its description were written by Isaac.