Fail Open vs. Fail Close? Neither Is Ideal. Introducing Inline-Bypass HA for NPB
written by Asterfuison
Table of Contents
What is Fail Open vs. Fail Close
In modern high-performance cybersecurity and observability architectures, traffic inspection has shifted from traditional SPAN/mirroring deployments to inline monitoring. Security tools such as DPI engines, IPS, NGFW, malware sandboxes, and traffic analyzers are now directly inserted into the production traffic path to achieve real-time filtering, deep packet inspection, and threat blocking.

But this architectural evolution introduces a critical question: What happens if the inline security device itself fails?
When a DPI appliance crashes, powers off, freezes, or becomes overloaded, the network is forced into a binary decision: Fail open vs. Fail close.
- Fail-Open: Service Availability First, Security Second.
If the security appliance fails, the network remains open and all traffic bypasses the security inspection path directly. Business services and user access stay online, but the network immediately loses traffic visibility and security enforcement.
- Fail-Close: Security First, Service Availability Second.
Also referred to as Fail-Secure. If the security appliance goes down, the system blocks all traffic to prevent the network from entering an uncontrolled state. Security policies remain enforced, but the production network becomes unavailable, which can lead to significant business impact.
Both modes involve a trade-off between availability and security. So why not achieve both?
Our approach integrates bypass functionality into the inline deployment architecture of the Packet Broker. By leveraging Inline-Bypass high availability capabilities, the solution maintains production network continuity while preserving inline security visibility and traffic inspection.
Let’s dive into it.

Note: This article focuses only on the bypass capabilities integrated into the NPB platform. In real-world deployments, a dedicated Inline Bypass Switch is still required to provide additional protection against NPB failure scenarios.
What Will the Packet Broker Do When Security Tools Go Down?
Now, let’s assume all DPI appliances become unavailable.
How does an NPB with Inline-Bypass HA capability avoid both extremes — Fail-Close network outages and completely uncontrolled Fail-Open forwarding?
The answer lies in its built-in hardware state machine design.
Step 1: Accurate Failure Detection — T1 Detection Delay
The SONiC-based NPB solution uses the T1 Delay mechanism to determine whether the link is truly down or only experiencing temporary flapping before triggering a bypass switchover.
Detection is performed at both the Layer 1 physical level and the Layer 2 protocol level. More importantly, the system can distinguish between transient interface instability and deep software freezes inside the DPI appliance.
The platform supports a configurable 1–30 second sliding detection window. This mechanism filters short network jitters and prevents unnecessary traffic switching caused by temporary fluctuations, improving overall network stability.

Step 2: ASIC-Level Shadow Forwarding Paths
Inside the underlying ASIC, the NPB predefines dual forwarding paths, also known as Shadow Paths, to enable fast traffic switchover.
- Normal Mode(Example based on field deployment) ISP Ingress → Add VLAN Tag → Distribute to Downlink LAG → Receive DPI Return Traffic → Strip VLAN Tag → Service Egress
- Bypass Mode ISP Ingress → ACL Matching → Direct Forwarding to Service Egress (preserving the original packet format)
For example, if the sliding detection window is configured to 5 seconds, and all DPI links remain unavailable after the timer expires, the NPB immediately switches traffic from Normal Mode to Bypass Mode.
This is exactly where the platform achieves a balance between Fail-Open and Fail-Close behavior.
In inline bypass deployment mode, the NPB does not blindly forward all traffic like traditional Fail-Open designs. Instead, it activates predefined bypass policies using ACL filtering based on VLAN IDs, IP addresses, transport ports, and other match fields.
Critical production traffic can still be selectively identified and forwarded, while unnecessary or risky traffic remains controlled. This preserves basic security enforcement while maintaining production network availability through direct forwarding.

Step 3: Safe Traffic Restoration — T2 Recovery Delay Protection
Once the DPI appliances recover or complete a warm restart, traffic must transition back from Bypass Mode to Normal Inspection Mode.
This restoration phase is often the highest-risk period for secondary failures. If the DPI signature database is still loading or the inspection engine has not fully initialized, redirecting traffic too early may overload the appliance again immediately.
To address this, the NPB introduces the T2 Recovery Delay mechanism, typically recommended at 60 seconds or longer.
The NPB only performs traffic restoration when at least one member link in the downstream DPI LAG remains continuously Up without flapping for the entire T2 interval.
This provides sufficient warm-up time for the DPI system and ensures the security appliance has fully recovered before production traffic is redirected back into the inspection path.

Visualization, Operations, and Auditing for Bypass Mode
Centralized operations and maintenance are performed through the Network Packet Broker (NPB) management console.
With a unified management platform, operations teams gain full lifecycle visibility and control over the bypass functionality.
- Device Status Dashboard: Real-time mapping of device status via three colors: Green (Normal), Red (Bypass), and Blue (Manual Forced)

- Dual-Mode Operation: Manual / Automatic Bypass Switching
In Manual mode, traffic bypasses the DPI, allowing for equipment upgrades or manual inspections In Auto mode, the system monitors DPI status; when switching conditions are met, traffic bypasses the DPI via the Inline-Bypass path and recovers quickly once revert conditions are satisfied

- Full Lifecycle Logging
Detailed recording of critical operational parameters such as “Total Down Trigger Time,” “Remaining Revert Time,” and “Trigger Port Details,” combined with synchronized Syslog emergency alarms, enables real-time fault correlation and fast operational response. This design ensures continuous network availability and allows operators to quickly identify and resolve issues, minimizing service disruption.

Conclusion
When inline security appliances fail, network operations should not be forced into a binary choice between Fail-Open (keeping business running with no security inspection) and Fail-Close (enforcing security at the cost of service outage).
The Inline-Bypass HA mechanism in the NPB breaks this rigid trade-off. As a third approach, it avoids fully sacrificing security while also preventing production network shutdown. Instead, it leverages underlying hardware-based Shadow Paths and intelligent ACL filtering to maintain critical traffic forwarding during failure scenarios, while preserving a controlled security baseline.
This provides a practical “third path” for modern resilient network architectures—ensuring both service continuity and security enforcement under failure conditions.
Learn More
📥 Click to view: “2026 Inline-Bypass High Availability Technical White Paper”
Request a demo or need assistance ?
Fill out the form, and we’ll reach out to you today !