NetOpsBench: Open Arena for NetOps in AI Infrastructure

Fair, reproducible benchmarks for agentic network troubleshooting.

Website · Quickstart · Build Your Agent · Trace Dataset

NetOpsBench is an open benchmark arena for agentic network troubleshooting — run reproducible fault scenarios on live SONiC-VS / Containerlab topologies, plug in any troubleshooting agent, and score it across quality and efficiency dimensions.

Why NetOpsBench

Developing and evaluating agentic root cause analysis methods for network troubleshooting remains challenging, with three core bottlenecks hindering further advancement:

Gap	The problem	How NetOpsBench closes it
No fair comparison	Varied network topologies, fault sets, observability tools, and evaluation metrics hinder the comparison of agentic troubleshooting strategies across the research community.	NetOpsBench unifies fault scenarios, observability access and scoring rules to support agent comparison on a shared benchmark.
Non-reproducible faults	Real network incidents cannot be reliably reproduced or labeled with consistent ground truth, slowing iterative improvement and evaluation of troubleshooting agents.	Containerlab + SONiC-VS inject controlled, reproducible faults with stable labels, so every run is an identical, repeatable episode.
Non-Interactive Environment	Static topology snapshots and logs cannot provide live probing and telemetry signals required by agents for diagnostic work.	NetOpsBench offers an interactive environment for agents to operate within live networks, capturing real-time Pingmesh data, gNMI telemetry and switch CLI evidence during every episode.

Overview

NetOpsBench provides: (1) an interactive and realistic environment mimicking production networks, with common tracing and telemetry tooling; (2) comprehensive and reproducible benchmarks covering a wide range of faults and failures; (3) an extensible architecture with an open SDK to readily integrate with various agent paradigms and observability tools, allowing users to try out their own agentic workflows.

It is built for researchers and engineers who want to compare LLM-backed, symbolic, heuristic, or hybrid troubleshooting strategies on the same operational benchmark, not just on static logs or hand-written prompts.

News

2026-05: 🎉 Initial Release - NetOpsBench is now available as an open arena for agentic network troubleshooting.
- Provide public SDK with run_scenario() and run_suite() APIs to launch live network environments from Python.
- Equip native MCP tools of complete observability utilities and pre-configured SONiC-VS network covering XS, Small, Medium and Large scales.
- Offer fault scenario generation scripts and an expanding repository of reproducible fault cases with standard ground truth labels.
- A full-fledged benchmark evaluator that accesses detection accuracy and token utilization efficiency.

Quick Start

NetOpsBench runtime execution requires Linux because Containerlab depends on Linux networking primitives.

Install and run via CLI

git clone https://github.com/NetX-lab/NetOpsBench.git
cd NetOpsBench

python -m venv .venv
source .venv/bin/activate
pip install -e ".[agent]"

netopsbench benchmark prepare --scales xs
export OPENAI_API_KEY=...
PYTHONPATH=. python examples/01_run_scenario.py --vendor openai

The first successful run produces a BenchmarkReport with case-level scores, timing, and artifact paths. For Docker, Containerlab, and runtime setup details, read Quickstart.

Run a scenario from Python

from examples.agents import MinimalDeepAgent
from netopsbench.sdk import NetOpsBench

scenario = "scenarios/generated/xs/generated_link_down_xs_001.yaml"

with NetOpsBench(workspace=".") as bench:
    agent = bench.agents.wrap(MinimalDeepAgent(vendor="openai"))
    run = bench.sessions.run_scenario(scenario=scenario, agent=agent)
    report = run.wait()

print(report.summary)

Scenario YAML files define the benchmark case: topology scale, traffic profile, fault type, target device, and interface-level ground truth when applicable. Use the Python API Guide for run_scenario(...), run_suite(...), and workers=N; see Custom Troubleshooting Agents when you are ready to replace MinimalDeepAgent with your own strategy.

Benchmark Results

NetOpsBench reports detection, fault type, device/interface localization, runtime, tool calls, and token usage so troubleshooting quality and operational cost can be compared together.

Read Benchmark Methodology for scoring definitions and Benchmark Results for an example completed suite.

Public agent trajectory artifacts are available in the NetOpsBench Trace Dataset, including Harbor/ATIF traces, run reports, and summary CSVs for reproducible analysis.

Learn More

Goal	Start here
Run one scenario	Quickstart
Run scenarios, suites, and batches	Running Benchmarks
Plug in your own troubleshooting agent	Custom Troubleshooting Agents
Use NetOpsBench from Python	Python API Guide
Interpret benchmark scores	Benchmark Methodology
Debug observability or runtime state	Operations
Understand the benchmark loop	System Overview

Community

Global community: NetOpsBench Slack
Chinese-language community: NetOpsBench Feishu group

Contributing

Contributions are welcome for benchmark scenarios, fault types, SDK ergonomics, documentation, and evaluation workflows.

License

NetOpsBench is released under the MIT License. See LICENSE.

Citation

If you use NetOpsBench in your research, please cite:

@software{netopsbench2026,
  author  = {Yang, Yitao and Xu, Hong},
  title   = {{NetOpsBench}: Open Arena for NetOps in AI Infrastructure},
  year    = {2026},
  url     = {https://github.com/netx-lab/NetOpsBench},
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
netopsbench		netopsbench
observability		observability
scenarios		scenarios
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NetOpsBench: Open Arena for NetOps in AI Infrastructure

Why NetOpsBench

Overview

News

Quick Start

Install and run via CLI

Run a scenario from Python

Benchmark Results

Learn More

Community

Contributing

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NetOpsBench: Open Arena for NetOps in AI Infrastructure

Why NetOpsBench

Overview

News

Quick Start

Install and run via CLI

Run a scenario from Python

Benchmark Results

Learn More

Community

Contributing

License

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages