Skip to content

Metric collector v2#286

Open
reggeenr wants to merge 56 commits into
mainfrom
metric-collector-v2
Open

Metric collector v2#286
reggeenr wants to merge 56 commits into
mainfrom
metric-collector-v2

Conversation

@reggeenr

@reggeenr reggeenr commented Feb 13, 2026

Copy link
Copy Markdown
Collaborator

This PR enables resource metric integration within Code Engine by running a metrics-collector that emits CPU, and memory usage metrics to IBM Cloud Logs.

Furthermore, this PR contains a dashboard that can be imported into IBM Cloud Monitoring:
monitoring-dashboard-ce-component-resources

On top, this PR demonstrates a away on how Code Engine jobs and apps can emit custom metrics, which are sent to Sysdig

See Readme for further details: https://github.com/IBM/CodeEngine/blob/metric-collector-v2/metrics-collector/README.md

To demonstrate custom metric collection, this PR provides an enrichment of the network-test-app (see readme https://github.com/IBM/CodeEngine/blob/metric-collector-v2/network-test-app/README.md)
image

@reggeenr reggeenr requested a review from norman465 March 2, 2026 23:32
Comment thread metrics-collector/setup/ibm-cloud-monitoring/README.md
Comment thread metrics-collector/main.go
Comment on lines +410 to +411
cpuCurrent := podMetric.Containers[0].Usage.Cpu().ToDec().AsApproximateFloat64() * 1000
memoryCurrent := podMetric.Containers[0].Usage.Memory().ToDec().AsApproximateFloat64() / 1000 / 1000

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Containers[0] guaranteed to be the right container?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. For apps, jobs yes!
For builds it seems that the larger container appears in the list as the first container.

I will double-check.

Comment thread metrics-collector/prometheus.yml.template
Comment thread metrics-collector/prometheus.yml.template Outdated
@reggeenr reggeenr requested a review from norman465 March 30, 2026 21:57

1. The metrics collector exposes Prometheus metrics on `localhost:9100/metrics`
2. The embedded Prometheus agent scrapes these metrics every 30 seconds
3. The agent also discovers and scrapes pods with the `codeengine.cloud.ibm.com/userMetricsScrape: 'true'` annotation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the agent would discover those pods and scrape additional custom metrics. Default CPU/Memory metrics would be collected for all workload irrespective of that label, right?

- `eu-de` - EU Central (Frankfurt)
- `eu-es` - EU Spain (Madrid)
- `eu-gb` - EU GB (London)
- `jp-tok` - Japan (Tokyo)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `jp-tok` - Japan (Tokyo)
- `jp-tok` - Japan (Tokyo)
- `in-che` - India (Chennai)

["eu-de"]="https://eu-de.monitoring.cloud.ibm.com"
["eu-es"]="https://eu-es.monitoring.cloud.ibm.com"
["eu-gb"]="https://eu-gb.monitoring.cloud.ibm.com"
["jp-tok"]="https://jp-tok.monitoring.cloud.ibm.com"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
["jp-tok"]="https://jp-tok.monitoring.cloud.ibm.com"
["jp-tok"]="https://jp-tok.monitoring.cloud.ibm.com"
["in-che"]="https://in-che.monitoring.cloud.ibm.com"

Comment thread metrics-collector/verify
#!/bin/bash
set -euxo pipefail

docker build --platform linux/amd64 . No newline at end of file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this compile & build verify? It's currently not used

Comment thread metrics-collector/main.go
Comment on lines +93 to +102
sb.WriteString("# HELP ibm_codeengine_instance_cpu_usage_millicores Current CPU usage in millicores\n")
sb.WriteString("# TYPE ibm_codeengine_instance_cpu_usage_millicores gauge\n")
for _, m := range metrics {
labels := fmt.Sprintf("ibm_codeengine_instance_name=\"%s\",ibm_codeengine_component_type=\"%s\",ibm_codeengine_component_name=\"%s\"",
escapeLabelValue(m.Name),
escapeLabelValue(m.ComponentType),
escapeLabelValue(m.ComponentName))
sb.WriteString(fmt.Sprintf("ibm_codeengine_instance_cpu_usage_millicores{%s} %d\n", labels, m.Cpu.Current))
}
sb.WriteString("\n")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to move those all out so every single type has its own method to have a clean format method that calls them one afther another then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants