Skip to content

BE: Fall back to per-topic rates for cluster throughput when broker aggregate is absent#1875

Open
Sangyong-Jeon wants to merge 1 commit into
kafbat:mainfrom
Sangyong-Jeon:bugfix/cluster-broker-throughput-topic-fallback
Open

BE: Fall back to per-topic rates for cluster throughput when broker aggregate is absent#1875
Sangyong-Jeon wants to merge 1 commit into
kafbat:mainfrom
Sangyong-Jeon:bugfix/cluster-broker-throughput-topic-fallback

Conversation

@Sangyong-Jeon

@Sangyong-Jeon Sangyong-Jeon commented Jun 12, 2026

Copy link
Copy Markdown
  • Breaking change?

What changes did you make? (Give an overview)

Closes #1874.

When the per-broker topic-less BrokerTopicMetrics aggregate is absent from the scraped metrics, InternalClusterState resolved cluster bytesInPerSec/bytesOutPerSec to null ("0 B/s") even though every per-topic rate was scraped successfully. This affects brokers that don't surface the topic-less aggregate over JMX (observed on Confluent cp-kafka 7.9.7): the cluster dashboard shows 0 throughput while topic details show real numbers.

InternalClusterState now falls back to summing the per-topic rates when the per-broker map is empty. This is exact, since bytes in/out are additive across topics and counted once at the leader broker, so sum(per-topic) == all-topics broker aggregate.

Is there anything you'd like reviewers to focus on?

Whether the fallback belongs in InternalClusterState (cluster aggregate) or IoRatesMetricsScanner (per-broker). The per-broker map can't be reconstructed in the scanner because topicBytes*PerSec is already cluster-wide (merged across brokers), so the cluster aggregate is the correct layer for the fallback. Per-broker dashboard values for such brokers remain a separate, lower-impact gap.

How Has This Been Tested? (put an "x" (case-sensitive!) next to an item)

  • Unit checks
  • Added InternalClusterStateTest: per-broker present (used as-is), per-broker empty + per-topic present (fallback), both empty (null).
  • Manually validated the root cause against a cp-kafka 7.9.7 broker: the global BrokerTopicMetrics:BytesInPerSec MBean is present via JmxTool but absent from kafka-ui's scraped set; per-topic throughput populated, cluster shows 0.

Checklist (put an "x" (case-sensitive!) next to all the items, otherwise the build will fail)

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (e.g. ENVIRONMENT VARIABLES)
  • My changes generate no new warnings (e.g. Sonar is happy)
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced cluster-wide I/O throughput metrics calculation with improved fallback mechanisms to ensure accurate reporting across different data availability scenarios.
    • Added comprehensive test coverage for metrics aggregation logic.

…egate is absent

- InternalClusterState derived cluster bytesIn/Out solely from the per-broker
  topic-less BrokerTopicMetrics aggregate; brokers that don't expose it over JMX
  (e.g. Confluent cp-kafka) left the per-broker map empty, so the dashboard showed
  0 B/s even though every per-topic rate was scraped correctly.
- Sum per-topic rates when the per-broker map is empty; this equals the all-topics
  broker aggregate since bytes in/out are additive across topics (counted once at
  the leader broker).
- Add InternalClusterStateTest covering broker-present, topic-fallback, empty cases.
@Sangyong-Jeon Sangyong-Jeon requested a review from a team as a code owner June 12, 2026 02:08
@kapybro kapybro Bot added status/triage/manual Manual triage in progress and removed status/triage/manual Manual triage in progress labels Jun 12, 2026
@kapybro

kapybro Bot commented Jun 12, 2026

Copy link
Copy Markdown

AI Summary

The issue describes a problem where Kafka UI's cluster dashboard incorrectly shows zero throughput when the per-broker topic-less BrokerTopicMetrics aggregate is missing, even though per-topic rates are available. The proposed solution is to fall back to summing per-topic rates when the per-broker map is empty, as the sum of per-topic rates equals the broker aggregate. The fallback logic is placed in InternalClusterState rather than IoRatesMetricsScanner because the cluster aggregate is the correct layer for this calculation. Unit tests and manual validation confirm the fix works.

@kapybro kapybro Bot changed the title BE: Fall back to per-topic rates for cluster throughput when the broker aggregate is absent Fall back to per-topic rates for cluster throughput when broker aggregate is absent Jun 12, 2026
@kapybro kapybro Bot added area/brokers Broker / broker configs related issues impact/changelog A PR with changes which should be addressed in the changelog explicitly impact/documentation A PR with changes which should be addressed in the documentation scope/backend Related to backend changes type/bug Something isn't working labels Jun 12, 2026
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 266d05db-8a01-48f9-b88d-4ed33514b24d

📥 Commits

Reviewing files that changed from the base of the PR and between c842552 and f451c60.

📒 Files selected for processing (2)
  • api/src/main/java/io/kafbat/ui/model/InternalClusterState.java
  • api/src/test/java/io/kafbat/ui/model/InternalClusterStateTest.java

📝 Walkthrough

Walkthrough

This PR updates InternalClusterState to compute cluster-wide byte I/O rates more robustly. A new sumWithTopicFallback helper method aggregates per-broker rates when available, falling back to per-topic rates when broker JMX metrics omit topic-less aggregates. The new behavior is tested with three cases covering broker-rate usage, topic-rate fallback, and null handling.

Changes

Fallback aggregation for cluster IO rates

Layer / File(s) Summary
Rate aggregation helper implementation
api/src/main/java/io/kafbat/ui/model/InternalClusterState.java
Adds java.util.Map import, introduces sumWithTopicFallback(@nullable) static method that sums broker-level IO rates when present or falls back to per-topic rates, and integrates it into bytesInPerSec and bytesOutPerSec computation.
Rate aggregation helper tests
api/src/test/java/io/kafbat/ui/model/InternalClusterStateTest.java
New test class validates sumWithTopicFallback behavior: returns broker-rate sum when present, falls back to topic-rate sum when broker rates are empty, and returns null when both sources lack data.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Poem

A rabbit hops through broker rates with care, 🐰
When JMX data isn't there,
Topic aggregates come to the rescue bright,
Fallback logic makes the metrics right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a fallback mechanism to use per-topic rates for cluster throughput when the broker aggregate is absent, which directly matches the implementation in InternalClusterState.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sangyong-Jeon! 👋

Welcome, and thank you for opening your first PR in the repo!

Please wait for triaging by our maintainers.

Please take a look at our contributing guide.

@Sangyong-Jeon Sangyong-Jeon changed the title Fall back to per-topic rates for cluster throughput when broker aggregate is absent BE: Fall back to per-topic rates for cluster throughput when broker aggregate is absent Jun 12, 2026
@Sangyong-Jeon

Sangyong-Jeon commented Jun 12, 2026

Copy link
Copy Markdown
Author

Real-world verification on Confluent cp-kafka

Built this PR's branch into a Docker image (JDK 25) and ran it side by side with the current ghcr.io/kafbat/kafka-ui:latest image against the same Confluent cp-kafka 3.9 broker over JMX (METRICS_TYPE=JMX) — same cluster, 64 topics, 118 partitions, live traffic.

Cluster throughput latest this PR
Production (bytesInPerSec) 0 Bytes 3 KB/s
Consumption (bytesOutPerSec) 0 Bytes 2 KB/s

This broker only surfaces per-topic BrokerTopicMetrics in the scraped set (the topic-less aggregate is not present), so latest resolves the cluster aggregate to null → "0 Bytes" even though every per-topic rate is scraped correctly. With the per-topic fallback the aggregate reflects real traffic. (Both instances run with identical config — only the throughput differs; the live rate fluctuates over time.)

latest (before):

latest shows 0 Bytes

this PR (after):

this PR shows real throughput

InternalClusterStateTest and checkstyleMain pass on JDK 25.

@Haarolean Haarolean added this to the 1.6 milestone Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/brokers Broker / broker configs related issues impact/changelog A PR with changes which should be addressed in the changelog explicitly impact/documentation A PR with changes which should be addressed in the documentation scope/backend Related to backend changes type/bug Something isn't working

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

Cluster/broker throughput shows 0 B/s despite per-topic rates being populated

2 participants