14W to Power a 30B LLM? The High-Efficiency AIOps Platform: ET2508
written by Asterfuison
Table of Contents
AI Module Empowers Campus Edge Inference Capabilities
As edge computing and AIOps continue to converge, running large-parameter AI models smoothly on power-constrained campus edge devices remains a major industry challenge.
On the ET2508 platform, we successfully integrated the M50 AI acceleration module through the M.2 slot. The module provides up to 160 TOPS AI computing performance. We not only support running OpenClaw, but also provide real-world benchmark results from this deployment, where a 30-Billion-parameter LLM was successfully run locally (firmware version: HiModel_xh2_qwen3-2507_30b_a3b_256_32k_b1_1chip_2cores_v1.0.0_20260210.gguf).
This article discusses the benchmark results and how an AI module enables an AIOps platform on campus network edge devices. If you are interested, you are welcome to continue reading.
Note: A token is the smallest unit processed by an AI model. It is also the basic unit used for model metering and capacity calculation. A token can represent a word, character, punctuation mark, or subword, rather than being strictly divided by word count or character count.
Understanding the Test Metrics
This evaluation focuses on six key dimensions: power efficiency, long-context processing, latency, throughput, concurrency, and thermal stability.
The test results show that this solution breaks the traditional limitation of “large models cannot run efficiently at the edge.” It also delivers ultra-low latency, providing a solid foundation for autonomous campus networks and AIOps-driven operations.
Before reviewing the results, the following LLM performance metrics are used throughout the test:
- TTFT (Time To First Token): The latency from sending a request to receiving the first token.
- TPOT (Time Per Output Token): The average generation time per output token during decoding.
- Prefill TPS: Prefill throughput. (Input Tokens / TTFT)
- Decode TPS: Decoding throughput. ((Output Tokens – 1) / Generation Time)
- Total TPS: Overall system throughput. (Total Tokens / Wall Clock Time)
- Perf/Watt: System throughput generated per watt of power consumption. (TPS / Power)
Result Analysis
- Low Power Consumption: Under full-load conditions, the M50 AI acceleration module consumes only about 14.4W to run a 30-billion-parameter LLM. The measured energy efficiency reaches 1.966 TPS/W.
- Strong Long-Context Performance: With a 32K-token input (approximately 25,000 English words), TTFT remains as low as 70.2 ms. This indicates strong prefill performance for long-context workloads. The platform is well suited for long-document summarization, extended conversations, and retrieval-augmented generation (RAG) scenarios.
- Ultra-Low Latency: With a standard 2048-token input, TTFT is reduced to below 40 ms. At the human perception level, 40 ms is nearly instantaneous. The extremely low first-token latency allows the model to respond immediately after the administrator submits a request, making it suitable for high-frequency interactive applications.
Throughput Performance:
- Stable Decoding Throughput: Decode throughput remains stable at around 29.2 tokens/s. This indicates consistent token generation during output, resulting in a smoother reading experience.
- TPOT Performance: TPOT represents the generation latency per token. The test shows stable TPOT performance between 34–38 ms, equivalent to approximately 26–29 tokens per second. This is significantly faster than normal human reading speed. In practice, the model output appears continuous and fluid, rather than displaying tokens one by one with visible delays.
An AIOps Platform for Conversational Network Execution
Based on the measured power efficiency, latency, and throughput results above, the 30B LLM running on the ET2508 platform is capable of serving as a practical productivity tool for front-line network operations.
By learning from official documentation and enterprise-specific knowledge bases, the solution can function as a conversational network assistant and support automated configuration deployment through natural language interactions. The workflow is shown as below:
[Natural Language Input From Network Operators]
│
▼
[ET2508 AI Acceleration Module (30B LLM)]
│ 🚀 TTFT < 40 ms: Fast first-token response with minimal perceived delay
│ 🚀 Decode TPS ~29.2 t/s: Smooth and consistent text generation
▼
[Inline Streaming Output]
├──► ① Step-by-step configuration examples (CLI command snippets)
├──► ② Risk warnings and conflict detection
└──► ③ One-click deployment confirmation (RESTful API)
Or implement device status checks, etc.

Why These Benchmark Results Matter for Agentic Operations
No More “Laggy Output”: During Interactive Operations: The measured 34–38 ms TPOT and stable 29.2 t/s decode throughput enable smooth streaming responses during live interactions. After an operator submits a request, the system can maintain real-time dialogue while continuously generating CLI command snippets and configuration examples on screen in a steady, readable flow.
Real-Time Verification During Generation: The output speed closely matches, and in some cases exceeds, normal human reading speed. This allows network engineers to review commands and validate configuration logic while the text is being generated. Compared with traditional LLM behavior — where the system pauses for several seconds and suddenly outputs hundreds of lines at once — this significantly reduces operational uncertainty and the risk of misconfiguration.
Localized Knowledge Base Integration: With only 70.2 ms TTFT under a 32K long-context workload, the platform can load complete device documentation and operational guides in real time. Operators can ask straightforward questions such as, “How do I enable BGP on this white-box switch?” The model can then reference local documentation and automatically generate the corresponding configuration script within milliseconds, eliminating the need to manually search through hundreds of pages of PDF manuals.
Conclusion: Bringing AI Operations to Campus Networks
The combination of the ET2508 platform and the M50 AI acceleration module enables campus edge devices to evolve into an AI-native agentic platform based on an AI-driven architecture.
Based on an M.2 Key hardware architecture, the M50 AI acceleration module consumes 14.4W of power and runs on the ET2508 platform, which operates under a full-system power envelope of 60W under full load..Within this low-power budget, it delivers 160 TOPS of edge AI compute capability and successfully runs a 30B-parameter large language model locally. Under a 32K long-context workload, it maintains a first-token latency of 70.2 ms.

This set of metrics demonstrates how an AIOps tool at the edge can enable autonomous network operations in campus environments:
- Offloading cloud dependency: Sensitive network configurations, local documentation, and internal knowledge base queries are fully executed on-device. This creates a closed-loop execution environment and reduces the risk of exposing critical enterprise network architecture data.
- Redefining the interaction model: With smooth token throughput (29.2 tokens/s) and ultra-low latency, traditional workflows such as manual CLI configuration and navigating large documentation sets are restructured into a reliable conversational execution flow. Configurations are generated in a streaming manner, with continuous validation during output.
In an era where edge intelligence is becoming practical, the ET2508 with the M50 AI acceleration module is no longer just a high-performance white-box networking device. It functions as a localized AI expert for campus network operators. It replaces manual effort with compute-driven execution and reduces troubleshooting cycles to millisecond-level responsiveness, marking a shift toward autonomous operations in campus networks.
Check Out
Your cart is currently empty!
New in store
-
4 x 25G SFP28, 48 x 2.5G RJ45 PoE++ Enterprise Access Switch, 4x100Gb QSFP28 Uplinks, Enterprise SONiC Ready -
4 x 25G SFP28, 48 x 2.5G RJ45 Enterprise Access Switch, 4x100Gb QSFP28 Uplinks, Enterprise SONiC Ready -
48 x 2.5G RJ45 | 4 x 25G SFP28 | Copper & Fiber Hybrid Access Switch | 4x100Gb QSFP28 Uplinks | Enterprise SONiC Ready -
24x 1G RJ45 PoE Access Switch, 6x25Gb SFP28 Uplinks, Enterprise SONiC Ready



