14W to Power a 30B LLM? The High-Efficiency AIOps Platform: ET2508

May 21, 2026

AI Module Empowers Campus Edge Inference Capabilities

As edge computing and AIOps continue to converge, running large-parameter AI models smoothly on power-constrained campus edge devices remains a major industry challenge.

On the ET2508 platform, we successfully integrated the AI acceleration module through the M.2 slot. The module provides up to 160 TOPS AI computing performance. We not only support running OpenClaw, but also provide real-world benchmark results from this deployment, where a 30-Billion-parameter LLM was successfully run locally (firmware version: HiModel_xh2_qwen3-2507_30b_a3b_256_32k_b1_1chip_2cores_v1.0.0_20260210.gguf).

This article discusses the benchmark results and how an AI module enables an AIOps platform on campus network edge devices. If you are interested, you are welcome to continue reading.

Note: A token is the smallest unit processed by an AI model. It is also the basic unit used for model metering and capacity calculation. A token can represent a word, character, punctuation mark, or subword, rather than being strictly divided by word count or character count.

Understanding the Test Metrics

This evaluation focuses on six key dimensions: power efficiency, long-context processing, latency, throughput, concurrency, and thermal stability.

The test results show that this solution breaks the traditional limitation of “large models cannot run efficiently at the edge.” It also delivers ultra-low latency, providing a solid foundation for autonomous campus networks and AIOps-driven operations.

Before reviewing the results, the following LLM performance metrics are used throughout the test:

TTFT (Time To First Token): The latency from sending a request to receiving the first token.
TPOT (Time Per Output Token): The average generation time per output token during decoding.
Prefill TPS: Prefill throughput. (Input Tokens / TTFT)
Decode TPS: Decoding throughput. ((Output Tokens – 1) / Generation Time)
Total TPS: Overall system throughput. (Total Tokens / Wall Clock Time)
Perf/Watt: System throughput generated per watt of power consumption. (TPS / Power)

Result Analysis

Low Power Consumption: Under full-load conditions, the AI acceleration module consumes only about 14.4W to run a 30-billion-parameter LLM. The measured energy efficiency reaches 1.966 TPS/W.
Strong Long-Context Performance: With a 32K-token input (approximately 25,000 English words), TTFT remains as low as 70.2 ms. This indicates strong prefill performance for long-context workloads. The platform is well suited for long-document summarization, extended conversations, and retrieval-augmented generation (RAG) scenarios.
Ultra-Low Latency: With a standard 2048-token input, TTFT is reduced to below 40 ms. At the human perception level, 40 ms is nearly instantaneous. The extremely low first-token latency allows the model to respond immediately after the administrator submits a request, making it suitable for high-frequency interactive applications.

Throughput Performance:

Stable Decoding Throughput: Decode throughput remains stable at around 29.2 tokens/s. This indicates consistent token generation during output, resulting in a smoother reading experience.
TPOT Performance: TPOT represents the generation latency per token. The test shows stable TPOT performance between 34–38 ms, equivalent to approximately 26–29 tokens per second. This is significantly faster than normal human reading speed. In practice, the model output appears continuous and fluid, rather than displaying tokens one by one with visible delays.

An AIOps Platform for Conversational Network Execution

Based on the measured power efficiency, latency, and throughput results above, the 30B LLM running on the ET2508 platform is capable of serving as a practical productivity tool for front-line network operations.

8 Core Arm Neoverse N2 Open Intelligent Gateway Based on Marvell OCTEON 10 CN102

Optional Modules

OS Selection

Clear

Please login to request a quote

By learning from official documentation and enterprise-specific knowledge bases, the solution can function as a conversational network assistant and support automated configuration deployment through natural language interactions. The workflow is shown as below:

[Natural Language Input From Network Operators]
            │
            ▼
[ET2508 AI Acceleration Module (30B LLM)]
            │ 🚀 TTFT < 40 ms: Fast first-token response with minimal perceived delay
            │ 🚀 Decode TPS ~29.2 t/s: Smooth and consistent text generation
            ▼
[Inline Streaming Output]
        ├──► ① Step-by-step configuration examples (CLI command snippets)
        ├──► ② Risk warnings and conflict detection
        └──► ③ One-click deployment confirmation (RESTful API)

Or implement device status checks, etc.

conversational assistant sample for AIops Platform

Why These Benchmark Results Matter for Agentic Operations

No More “Laggy Output”: During Interactive Operations: The measured 34–38 ms TPOT and stable 29.2 t/s decode throughput enable smooth streaming responses during live interactions. After an operator submits a request, the system can maintain real-time dialogue while continuously generating CLI command snippets and configuration examples on screen in a steady, readable flow.

Real-Time Verification During Generation: The output speed closely matches, and in some cases exceeds, normal human reading speed. This allows network engineers to review commands and validate configuration logic while the text is being generated. Compared with traditional LLM behavior — where the system pauses for several seconds and suddenly outputs hundreds of lines at once — this significantly reduces operational uncertainty and the risk of misconfiguration.

Localized Knowledge Base Integration: With only 70.2 ms TTFT under a 32K long-context workload, the platform can load complete device documentation and operational guides in real time. Operators can ask straightforward questions such as, “How do I enable BGP on this white-box switch?” The model can then reference local documentation and automatically generate the corresponding configuration script within milliseconds, eliminating the need to manually search through hundreds of pages of PDF manuals.

Conclusion: Bringing AI Operations to Campus Networks

The combination of the ET2508 platform and the AI acceleration module enables campus edge devices to evolve into an AI-native agentic platform based on an AI-driven architecture.

Based on an M.2 Key hardware architecture, the AI acceleration module consumes 14.4W of power and runs on the ET2508 platform, which operates under a full-system power envelope of 60W under full load..Within this low-power budget, it delivers 160 TOPS of edge AI compute capability and successfully runs a 30B-parameter large language model locally. Under a 32K long-context workload, it maintains a first-token latency of 70.2 ms.

This set of metrics demonstrates how an AIOps tool at the edge can enable autonomous network operations in campus environments:

Offloading cloud dependency: Sensitive network configurations, local documentation, and internal knowledge base queries are fully executed on-device. This creates a closed-loop execution environment and reduces the risk of exposing critical enterprise network architecture data.
Redefining the interaction model: With smooth token throughput (29.2 tokens/s) and ultra-low latency, traditional workflows such as manual CLI configuration and navigating large documentation sets are restructured into a reliable conversational execution flow. Configurations are generated in a streaming manner, with continuous validation during output.

In an era where edge intelligence is becoming practical, the ET2508 with the AI acceleration module is no longer just a high-performance white-box networking device. It functions as a localized AI expert for campus network operators. It replaces manual effort with compute-driven execution and reduces troubleshooting cycles to millisecond-level responsiveness, marking a shift toward autonomous operations in campus networks.

Request a demo or need assistance ?

Fill out the form, and we’ll reach out to you today !

Latest Posts

AI Fabric & Data Center Switch

Campus Access & Aggregation

DPU & Edge Bare Metel Platform

High-Performance PTP Switch

Network Packet Broker

Open Packet Broker

Asteria Network Controller

P4-Programmable Switch

Wireless Access Point

Optical Transceiver & Cable

COM Express Module

AI Networking

Data Center

Enterprise

Carrier Network

AsterNOS for AI Fabric & Data Center

AsterNOS for Campus Networks

AsterNOS-VPP Routing OS

PB-APP for Packet Broker

Blogs

Whitepaper

NOS Updates

Implementation

RMA Policy

Events

About Us

14W to Power a 30B LLM? The High-Efficiency AIOps Platform: ET2508

Table of Contents

AI Module Empowers Campus Edge Inference Capabilities

Understanding the Test Metrics

Result Analysis

An AIOps Platform for Conversational Network Execution

8 Core Arm Neoverse N2 Open Intelligent Gateway Based on Marvell OCTEON 10 CN102

Why These Benchmark Results Matter for Agentic Operations

Conclusion: Bringing AI Operations to Campus Networks

Latest Posts

What is VRRP in SONiC Networking? How Does It Work?

Asterfusion SONiC based 32 x 400G QSFP-DD Data Center Switch Overview

Asterfusion SONiC NOS based 48x1G RJ45 PoE+ layer 3 Enterprise Switch Overview

AsterNOS: Enterprise-ready SONiC NOS for Cloud, AI and Campus