Skip to main content

Open-Source Monitoring Tools for Open Network Switches & Prometheus

written by Asterfuison

February 27, 2025

In our last article, we briefly introduced open-source network monitoring tools and explored their strengths and weaknesses. This time around, we’ll explore some of the most popular open-source network monitoring tools available today—what they’re all about, where they shine, and where they fall short.

But we’re not stopping there. We’ll also focus on how Asterfusion integrates Prometheus with our very own enterprise-grade SONiC-AsterNOS to build an open-source, visual monitoring solution. This combination offers both power and flexibility for your network monitoring needs.

So, let’s start with the big players in the open-source network monitoring space:

Comparison of main open-source network monitoring tools on the market

ToolUse CaseMain FeaturesKey CharacteristicsPros and Cons
CactiNetwork Device MonitoringNetwork bandwidth monitoring, device status graph generation, SNMP data collectionGraphical display, SNMP data collection, primarily used for network bandwidth and device status monitoringEasy to use but limited in functionality, mainly for network monitoring
NagiosComprehensive IT Infrastructure MonitoringMonitors servers, network devices, applications, alerting and notification system, plugin extensibilityFlexible plugin architecture, supports multiple protocols, strong extensibility and alerting mechanisms, extensive community supportComplex configuration, suitable for small to medium-sized environments
Icinga 2IT Infrastructure MonitoringComplex configuration, but powerful, suitable for large-scale deployments, high-performance requirementsInherits from Nagios, supports distributed monitoring, modern web interface, powerful notification system, flexible configurationComplex configuration, suitable for medium to large environments, higher learning curve
ZabbixComprehensive IT Infrastructure MonitoringMonitors servers, network devices, virtualization, applications, automatic discovery, alerting and notificationsPowerful data collection (supports SNMP, JMX, IPMI, etc.), built-in graphs and reports, supports automatic discoveryAI-based automated detection and root cause analysis, deep application performance monitoring, support cloud-native environments, high automation
PrometheusTime Series Data Monitoring, Containerized EnvironmentsTime series data monitoring, service auto-discovery, PromQL query language, alertingPowerful time series data storage and querying capabilities (PromQL), efficient data collection and processing, often used with Grafana, suitable for dynamic environmentsNeeds to be paired with Grafana, steep learning curve, focused on time series data
DynatraceEnterprise Application Monitoring, Cloud and Microservice MonitoringApplication performance monitoring (APM), root cause analysis, automated detection, cloud-native environment monitoringAI-based automated detection and root cause analysis, deep application performance monitoring, supports cloud-native environments, high automationPrimarily a commercial paid product, expensive, best suited for enterprise applications

And now, let’s zoom in on Prometheus.

What is Prometheus?

Prometheus was born at SoundCloud in 2012 out of a need to solve the monitoring problems of dynamic, service-oriented architectures. Traditional monitoring tools at the time simply couldn’t keep up with the speed and complexity of modern systems. Fast forward to 2016, Prometheus joined the Cloud Native Computing Foundation (CNCF), alongside projects like Kubernetes, and by 2018, it had matured into the go-to tool for cloud-native monitoring. Today, it’s the industry standard—trusted by enterprises and developers worldwide for its power and adaptability.

Core Functionality of Prometheus

At its core, Prometheus is an open-source toolkit designed specifically to handle time-series data. Let’s break down its key features:

  1. Collecting and Storing Time-Series Data Prometheus pulls metrics from servers, network devices, and apps—think of it like capturing dynamic snapshots of your system’s health. It then stores them efficiently, allowing you to track and analyze changes over time.
  2. PromQL – The Power of Queries Prometheus comes with PromQL, a super-flexible query language that lets you dive deep into the metrics, aggregate them, and extract exactly the data you need. It’s like having a customizable lens to view your system’s performance.
  3. Alerting System Alerts are crucial, and Prometheus has got you covered. By connecting to Alertmanager, it sends notifications (email, Slack, etc.) when your system crosses important thresholds—keeping you in the loop and ready to act fast.

Prometheus’ Component Overview

Prometheus is built like a modular puzzle, with each piece playing a vital role. Here’s how it all fits together:

  1. Prometheus Server The heartbeat of the system, this lightweight service scrapes data from your targets, stores it in a time-series database, and serves up those metrics via PromQL. It’s simple to deploy but handles some heavy lifting.
  2. Exporters These are the little helpers that convert metrics from various systems into a format Prometheus can understand. Whether it’s a server’s health stats or network device data, there’s likely an Exporter for it (e.g., SNMP Exporter for network devices, Node Exporter for servers). The community has built hundreds of them to cover nearly every use case.
  3. Push Gateway For systems that can’t be scraped directly (like short-lived batch jobs), the Push Gateway steps in. It lets data be pushed to Prometheus rather than pulled, ensuring no metrics are left behind.
  4. Alertmanager No system is complete without alerts. The Alertmanager handles those notifications generated by Prometheus—grouping, deduplicating, and routing them to the right channels (like email or Slack).
  5. Grafana – The Visualization Powerhouse

While Prometheus offers a basic web UI for querying and graphing data, Grafana takes things to the next level. This visualization platform allows you to build stunning, interactive dashboards that help you spot trends and make informed decisions at a glance.

How It All Works Together

how-prometheus-and-grafana-work-together

To visualize the relationship between these components, imagine this:

  • Prometheus Server sits at the center, periodically scraping data from Exporters (like an SNMP Exporter on a switch).
  • The Push Gateway captures occasional data from batch jobs and pushes it into Prometheus.
  • Alertmanager watches for threshold breaches and sends out notifications via channels like email or Slack.
  • Grafana connects to Prometheus, displaying the collected metrics on interactive, customizable dashboards.

In this ecosystem, Prometheus pulls the data, Alertmanager keeps you informed, and Grafana brings the data to life visually. It’s a monitoring dream team!

Asterfusion SONiC-based Cloud Network Switch Monitoring Solution Powered by Prometheus

sonic-based-switch-monitoring-solution-powered-by-prometheus

Asterfusion’s AsterNOS, powered by SONiC, is an enterprise-grade network OS designed to meet the demands of both data centers and campuses. With robust L2/L3 features and virtualization capabilities, it supports hardware ranging from 1G to 800G, adapting effortlessly to diverse environments. AsterNOS takes full advantage of its containerized architecture, simplifying the deployment of Exporters to collect and expose switch data via HTTP. This makes it easy for Prometheus and Grafana to access key metrics, streamlining monitoring and boosting efficiency. With Prometheus, users gain real-time insights into switch performance, health, security, and the status of implementation of features ensuring stable, high-performing networks, and transforming the switch operation and maintenance experience!

Prometheus Deployment on Asterfusion SONiC-Based Cloud Switches

Through Asterfusion self-develop the “Exporter” container interacts with Redis, extracting data from the database and providing it to Prometheus. This method is both efficient and aligns with the architecture of modern cloud-native environments.

Exporter Container:

  • The AsterNOS Exporter is a monitoring component developed by Asterfusion, running as a container on the AsterNOS operating system. It integrates with the high-performance monitoring platform Prometheus, supporting the following features as below:
  • The Exporter container interacts with the Redis database to retrieve time-series data or other monitoring metrics. The data is then transmitted to Prometheus:
prometheus-deployment-on-sonic-based-switches-functions
grafana-interface-dashboards

Data transfer to Prometheus:

  • The data extracted by the Exporter container is exposed in a format supported by Prometheus (e.g., Prometheus metrics format), allowing the Prometheus server to scrape this data at regular intervals.
data-extracted-by-the-exporter-container-data-transfer-to-prometheus

Prometheus Server:

  • The Prometheus server is responsible for collecting data from multiple Exporters, storing the time-series data, and querying and alerting through PromQL.
  • The Prometheus server can be deployed locally or in the cloud for more flexible scaling and management.

Grafana integrated visualization interface

  • Once the data is collected by Prometheus, it can be visualized using Grafana. Grafana integrates with Prometheus, allowing the creation of custom panels and views based on the Prometheus-defined metric formats.
  • Grafana offers powerful flexibility, allowing users to customize and modify data presentation, creating personalized monitoring dashboards.
grafana-integrated-visualization-interface-1
grafana-integrated-visualization-interface-2

Alert Mechanism

Prometheus has a built-in alerting function that sends notifications (such as emails or Slack messages) via Alertmanager when metrics exceed preset thresholds. This is critical in network monitoring as it enables timely detection of failures or performance bottlenecks, ensuring system stability.

prometheus-alert-mechanism-interface

Integration and Expansion

In a containerized environment, Prometheus can integrate with other components (such as network configuration tools and log systems) to build a complete monitoring ecosystem. Its modular design enhances scalability and adaptability.

Asterfusion Grafana interface screenshot

Grafana Interface – MCLAG, NTP

grafana-interface-mclag
grafana-interface-ntp

Grafana Interface – DOM state

grafana-interface-dom-state

Grafana Interface – CRM state

grafana-interface-crm-state

Grafana – Interface state

grafana-interface-state

Grafana – EVPN

Grafana-evpn

Grafana – RoCE Congestion Control

Grafana-roce-congestion-control

Grafana – Buffer

Grafana-buffer-1
Grafana-buffer-2

Deploying the Exporter on the Asterfusion CX-N switch is incredibly simple:

  1. Enable the Exporter feature.
  2. Install Prometheus on the server and modify the configuration file to add the monitoring device list.
  3. Install Grafana on the server and import the AsterNOS-Exporter_Dashboard_XXX.json provided by us as the base monitoring template.

Seamless integration, rapid deployment—no complicated steps required. Experience an efficient cloud-native monitoring solution right away!

Prometheus Runs on SONiC-based campus layer 2/3 switches

Prometheus can run not only on our data center switches but also on our SONiC-based campus switches!

please refer to this article: Prometheus Monitoring SONiC-based PoE Switches and Sending Alerts to Slack

Case Study

open-network--switches-monitoring-solution

Background As a leading third-party IDC service provider in Asia, the user faced complex device monitoring issues and high operational costs due to traditional vendor switches and SDN controllers. The conventional maintenance model resulted in slow fault detection, and low efficiency, and the existing system lacked the flexibility to scale, failing to meet future development needs. To reduce operational costs and improve efficiency, the user chose Asterfusion’s Prometheus-based monitoring solution, built on an open network architecture.

Solution Value

  1. Precise Monitoring, Real-Time Network Insights After introducing Prometheus, the user could monitor the entire infrastructure in real-time, capturing key performance metrics such as traffic, latency, and bandwidth utilization from switches, routers, and other devices. Prometheus’ pull-based monitoring model ensures fast and accurate retrieval of network status, keeping the network environment in optimal condition at all times.
  2. Efficient Fault Localization, Data-Driven Maintenance Decisions With Prometheus’ powerful query capabilities, users could easily query real-time data from devices and visualize it via Grafana. Deep integration with Asterfusion’s maintenance platform automatically generated historical data trend reports, enabling quick fault root cause analysis and reducing troubleshooting time. Prometheus’ custom queries (PromQL) and flexible alerting policies further enhanced monitoring precision and operational response efficiency.
  3. Unified Operation Management, Simplified Processes Prometheus’ unified monitoring dashboard allowed users to manage all network devices centrally. Custom alert rules, integrated with Alertmanager, ensured that operations personnel received immediate notifications and could quickly act on alerts. The visual display of data flows made it easier for operations staff to understand network health, reducing the time spent on manual checks and troubleshooting.
  4. Flexible Expansion, Strong Integration Capabilities Prometheus’ RESTful API simplified integration with other third-party systems, such as fault management and automation tools. Users could easily expand the monitoring system to create customized monitoring and management solutions, supporting fine-tuned operations for complex business scenarios.

With Prometheus’s formidable time-series monitoring capabilities, users gain real-time insights into switch performance, health, and security status, ensuring stable and efficient network operations. Paired with Grafana’s intuitive visualization dashboards, this solution delivers a precise and user-friendly monitoring experience for operations teams, empowering users to confidently tackle the challenges of network management with ease!

Summary

Deploying Prometheus on an Asterfusion SONiC-based containerized switch allows for efficient collection, storage, and querying of network performance data. Coupled with Grafana’s visualization and alert mechanisms, potential issues can be quickly addressed. This architecture combines the strengths of open-source network operating systems and modern monitoring tools, providing powerful and flexible support for network operation and maintenance.

Refer to:

Prometheus Monitoring SONiC-based PoE Switches and Sending Alerts to Slack

4 Ways to Enhance Visibility& Unified Management on SONiC Open Network Switches

Critical Information You Should Know about Open Source Network Monitoring Tools and Prometheus

Latest Posts