Skip to content

feat(logging): add Loggers registry (6/6)#737

Open
kamcheungting-db wants to merge 13 commits into
apache:mainfrom
kamcheungting-db:logging-block6-registry
Open

feat(logging): add Loggers registry (6/6)#737
kamcheungting-db wants to merge 13 commits into
apache:mainfrom
kamcheungting-db:logging-block6-registry

Conversation

@kamcheungting-db

@kamcheungting-db kamcheungting-db commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Part 6 of the logging stack (builds on #726). Completes the system with configuration-driven backend selection, mirroring MetricsReporters.

What's here

  • Loggers::Register(type, factory) registers a named backend; Loggers::Load(properties) builds one, choosing the type from the logger-impl property.
  • Built-ins: noop, cerr, and (when built with spdlog) spdlog. With no logger-impl set, the default is spdlog when compiled in, else cerr — i.e. logs by default.
  • Loggers::LoadAndSetDefault(properties) builds a logger and installs it as the process default.

End-to-end testslogging_end_to_end_test drives the public surface the way an app does: configure a backend + level via properties, install it, log through the macros, and check the real output. Also covers the compiled-in default backend and a macro reaching a real spdlog sink. clang/libc++, spdlog ON and OFF.

This pull request and its description were written by Isaac.

@kamcheungting-db kamcheungting-db force-pushed the logging-block6-registry branch 7 times, most recently from 7cd7508 to 00008ae Compare June 16, 2026 03:47
Second block: the backend-agnostic logging API and the swappable default logger.

- Logger: pure-virtual sink (ShouldLog/Log/SetLevel/level/Flush/IsNoop), with
  ShouldLog() as the single authority for runtime filtering and a documented
  no-reentrancy / thread-safety / noexcept contract. Mirrors MetricsReporter.
- LogMessage owns its formatted text (moved in) so a sink may retain records;
  reserves an owning attributes vector for future structured logging without an
  ABI break. logger.h is backend-agnostic -- it never includes the build config
  header nor references the spdlog feature macro, so consumers see one stable API.
- Process-global default logger: GetDefaultLogger / SetDefaultLogger /
  SetDefaultLevel, plus a lock-free thread-local generation cache on the hot path
  (no lock or refcount churn in steady state; slot mutex only on swap/refresh).
  NoopLogger is the placeholder default here; CerrLogger and the spdlog backend
  arrive in the following blocks.

Builds the source into both CMake and Meson and installs logger.h alongside
log_level.h; logger_test covers the default-logger API, level lowering taking
effect immediately, and concurrent swap/read safety.

Co-authored-by: Isaac
@kamcheungting-db kamcheungting-db force-pushed the logging-block6-registry branch from 00008ae to bb7c0af Compare June 16, 2026 03:58
Resolves @wgtmac's review comments on PR apache#723:

- Mark ShouldLog()/SetLevel()/level() noexcept so a custom logger cannot throw
  from the hot filter path; propagated to NoopLogger and CapturingLogger.
- Make CurrentLogger() safe during thread teardown without UB: the per-thread
  cache is reached via a trivially-destructible thread_local pointer to a
  never-freed heap slot (kept LSan-reachable by a registry), so logging from a
  thread_local destructor in any order is safe. Replaces the AliveFlag guard.
- Add ICEBERG_EXPORT to LogAttribute/LogMessage, matching Error/ScanReport.
- Trim the logger.h \file brief (the macros it referenced arrive in PR apache#725).
- Tests: use the IsError matcher and std::ignore; strengthen the teardown test
  to log from a thread_local destroyed after the cache.

Co-authored-by: Isaac
Addresses @wgtmac's review comment on PR apache#723 asking for a std::format-style API
that builds the string only after the level check.

- A single overloaded iceberg::Log():
    Log(level, "fmt {}", args...)           -> process-default logger
    Log(logger, level, "fmt {}", args...)   -> explicit logger
  disambiguated by the first argument.
- Formats only after ShouldLog() (no wasted work when disabled) and never throws
  -- a formatting failure routes to EmitFormatError, matching the macros.
- The call-site source_location is captured via a small consteval FmtWithLoc
  wrapper (the std::print technique), preserving compile-time format-string
  checking without a macro.

The sink interface (Logger::Log(LogMessage&&)) intentionally stays
pre-formatted: passing std::format_args across the virtual boundary is
lifetime-unsafe and would break safe record retention. The format-style win is
delivered at the call layer instead.

Co-authored-by: Isaac
Addresses @wgtmac's review comment on PR apache#723 that LogMessage::attributes had no
easy fill path.

Adds a fluent LogMessage::Builder so engine loggers can attach structured fields
(query id, task id, table name, snapshot id, file path, ...):

  auto record = LogMessage::Builder(LogLevel::kInfo)
                    .Message("scan finished")
                    .Attribute("table", table_name)
                    .Attribute("snapshot_id", std::to_string(id))
                    .Build();
  logger->Log(std::move(record));

- Message() sets the formatted text; Attribute() appends an owned key/value;
  Location() overrides the source location (defaults to the build site).
- Build() moves the accumulated state into a LogMessage.

This defines how attributes get populated, so structured logging can grow
without an ABI break.

Co-authored-by: Isaac
Replaces the intentional permanent per-thread leak with a cache that is freed at
thread exit, while staying safe to call CurrentLogger() from any thread_local
destructor during teardown.

The per-thread state is split by lifetime:
- `dead` (bool) and `cache` (pointer) are trivially destructible, so their
  storage stays valid for the whole thread and is readable from any other
  thread_local's destructor, in any destruction order.
- A `guard` (the only piece with a destructor) frees the cache and sets `dead`
  at thread exit. A thread_local destroyed after it reads `dead == true` and
  falls back to an immortal no-op; one destroyed before it still sees a live
  cache. Either order is safe.

This is the correct version of the earlier AliveFlag attempt, which was UB
because the flag object itself had a destructor (reading it after that
destructor ran was the very use-after-lifetime it tried to prevent). Drops the
LSan-reachability registry, which is no longer needed now that nothing leaks.

Validated with a dedicated stress harness under ThreadSanitizer and
AddressSanitizer (no leaks): logging from a thread_local destroyed after the
cache; logging interleaved with concurrent SetDefaultLogger swaps and thread
teardown; and a thread whose only logging happens in a thread_local destructor.
Adds ConcurrentTeardownAndSwapIsSafe to cover the combined path in CI.

Co-authored-by: Isaac
Bind a logger for the current thread + scope so the default path
(CurrentLogger/Log(level,...)/LOG_* macros) routes to it, with the global
default as fallback. Engines can route Iceberg's own internal logs into a
per catalog/session/query/task context without touching call sites.
Third block: the first concrete sink, and the process default until the spdlog
backend lands.

- CerrLogger writes to std::cerr with a fixed line layout
  `YYYY-MM-DDThh:mm:ss.mmmZ LEVEL [tid] file:line] message`. Timestamps use UTC
  std::chrono floored to milliseconds (no gmtime/localtime -- thread-unsafe);
  the thread id is the OS-native id, cached per thread.
- Level is a std::atomic<LogLevel>; a mutex guards the whole formatted-line write
  so concurrent lines never interleave. Log()/Flush() wrap stream ops in
  try/catch so the noexcept contract holds even if the stream throws.
- MakeDefaultLogger() now returns CerrLogger.

Wired into both builds and installed; cerr_logger_test covers layout, level
filtering, and concurrent-write safety.

Co-authored-by: Isaac
Fourth block: the application-facing macros, the only part most callers touch.

- ICEBERG_LOG_{TRACE,DEBUG,INFO,WARN,ERROR,CRITICAL,FATAL} plus the generic
  ICEBERG_LOG(level, ...), ICEBERG_LOG_TO(logger, level, ...) for an explicit
  logger, and ICEBERG_LOG_RUNTIME_FMT for a runtime (non-literal) format string.
- ICEBERG_LOG_ACTIVE_LEVEL is a compile-time severity floor: statements below it
  are removed entirely via `if constexpr` (no format call site, no source
  location emitted). ICEBERG_LOG_FATAL is never gated by the floor -- its abort
  is always compiled in; it emits, best-effort Flush()es the same logger it
  emitted to, then std::abort().
- Filtering is decided solely by Logger::ShouldLog(); formatting is wrapped in
  try/catch so logging never throws (a format failure routes to EmitFormatError).
- Bare Java-style aliases (LOG_INFO, ...) are opt-in via ICEBERG_LOG_SHORT_MACROS
  to avoid polluting consumers / colliding with glog/abseil.

Header-only addition to logger.h. macros_test covers injection, the
guard-before-format short-circuit, never-throws, and FATAL aborts;
macros_active_level_test verifies compile-time stripping in a kOff translation unit.

Co-authored-by: Isaac
Fifth block: the default production backend and the build option that selects it.

- SpdLogger wraps spdlog::logger (kCritical/kFatal -> spdlog critical, others 1:1),
  forwarding the pre-formatted message and source location. Synchronous only in
  v1 (spdlog's source_loc is a non-owning const char*, unsafe with async sinks).
  It lives in logging/internal/, is gated by #ifdef ICEBERG_HAS_SPDLOG, and is
  NOT installed -- consumers obtain it via the default logger or the registry,
  never by including spdlog headers.
- New ICEBERG_SPDLOG CMake option (default ON). config.h is ALWAYS generated
  (only ICEBERG_HAS_SPDLOG's definedness varies) so logger.cc compiles in both
  configurations; MakeDefaultLogger() prefers SpdLogger when compiled in, else
  CerrLogger.
- Critically, ICEBERG_SPDLOG=OFF now UNWIRES the previously-unconditional spdlog
  link (interface-lib lists + resolve_spdlog_dependency), not just the new
  source -- so an OFF build has no spdlog dependency at all.

spdlog_logger_test (compiled only on the ON path) covers the level mapping
including fatal->critical and source-location forwarding.

Co-authored-by: Isaac
Final block: configuration-driven backend selection, mirroring MetricsReporters.

- Loggers::Register(type, factory) registers a named backend; Loggers::Load(props)
  builds one, selecting the type from the "logger-impl" property key.
- Built-in factories: "noop", "cerr", and (only when built with ICEBERG_SPDLOG)
  "spdlog". With no logger-impl set, the default is spdlog when compiled in, else
  cerr -- logs by default, an intentional divergence from the metrics registry's
  noop default.
- Loggers::LoadAndSetDefault(props) loads a logger and installs it as the process
  default.

This completes the system end to end: levels -> Logger interface + default logger
-> CerrLogger/SpdLogger backends -> macros -> configuration-driven selection.
loggers_test covers load default/noop/cerr, unknown-type errors, empty-factory
rejection, custom Register, and LoadAndSetDefault.

Adds logging_end_to_end_test, which drives the public surface as an application
does -- now that every layer is present: configure a backend via the registry,
install it as the default, log through the LOG_* macros, and observe real output.
Covers registry -> default-slot -> macro -> backend -> std::cerr output, level
filtering through the full macro path, the compiled-backend identity of the
default (spdlog when ON, cerr when OFF), the "spdlog" factory by name, and a macro
statement reaching a real spdlog sink.

Co-authored-by: Isaac
@kamcheungting-db kamcheungting-db force-pushed the logging-block6-registry branch from bb7c0af to 5e57438 Compare June 21, 2026 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant