Deep Dive Into High Availability Features In Cisco Enterprise Devices

SERVIDORES

Part 2 of the 3-part High Availability Series

My recent blog on high availability (HA) for enterprises provided an overview of features in Cisco IOS XE Software that contribute to HA. On three continents, Cisco software engineers are working on IOS XE features that embed multiple processes for failover in bare metal, virtualized, and wireless infrastructure. They are engineering ways to maintain device state without interruption with real-time data synchronization, ensuring data is encrypted and decrypted seamlessly to guard against hacking, and reducing software upgrade times from hours to 30 seconds, all to further decrease downtime.

Here is an expanded view of some of those features that contribute mightily to HA in the enterprise.

Operational Data Manager

Processes in active switches update the database and the database maintains the device's state. Since the standby doesn't communicate with the outside world, it is updated by the active switch, and it uses Operational Data Manager (ODM) to update the database (Figure 1). ODM uses Replication Manager (REPM) to trigger all the data to sync from an active to a standby switch.

Figure 1. Operational Data Manager

The REPM is a Basic Input/Output System (BINOS) process responsible for Crimson DB synchronization from an active switch to a standby switch. The REPM library is initialized as the HA service library where the active and standby role resolution is done. The REPM shim layer registers the databases and tables for monitoring and shadowing. All stateful data is synced by REPM without the direct involvement of the applications.

When the standby starts, the REPM on the standby requests the active REPM to start replication. It makes sure the replicated data goes to the intended target. The update first goes to the database and then updates the processes in the hot standby switch.

The ODM client drains all pending messages before it switches from write to read on the local database so that the subsequent local database write-by feature will not fail. The ODM server owns the consolidated database resources (e.g., tables, records, cursors) and the ODM client owns local operational database resources like cursors.

In wireless deployments and StackWise Virtual Link platforms, there are only two nodes: one active, and one standby. So, two protocols were created to enhance HA in these environments: Redundancy Management Interface (RMI) and Dual Active Detection (DAD).

Redundancy Management Interface

RMI was created as a second interface across the wireless controllers to ensure reachability. If the Redundancy Port (RP) link goes down, the RMI infrastructure on the standby and active controllers communicate via the RMI interface. Then, based on gateway reachability and node status, it moves one controller into recovery mode. It will ensure that one good controller is active at a time in this fault scenario.

There is a heartbeat mechanism between the active and standby controllers over the RP link. Previously, if the heartbeat failed, there was no mechanism to find out if the failure was limited to the link or if the other controller had failed. If the failure was on the link, the standby might assume that the active had failed. The standby would then become the new active node and claim the management interface IP. This happens by sending a gratuitous Address Resolution Protocol (ARP) response by the new active controller that maps the management interface IP to its own MAC address. The standby-turned-active controller starts processing access points and client messages and other traffic. Though the old active is up with the same IP, it will not receive any more traffic, leaving the system in an indeterminate state.

The RMI helps avoid this kind of indeterminate state and failover based on a momentary glitch, which can occur in wireless, especially with outdoor products. This interface is used as a secondary link between the active and the standby controllers and enables both to be active momentarily. The IP address on this interface should be configured in the same subnet as the management interface. The status of the RP link along with the status of the peer as determined by the RMI link determine if a switchover should be triggered.

Dual Active Detection

For StackWise Virtual Link-based platforms,?which provide the ability to visualize two connected switches into a single switch, if the connection between the active and standby switches is lost and one switch fails over to the second, the Dual Active Detection (DAD) process is activated. It queries the node manager for the existence of the lost peer. If it is available, it sends a recovery handshake. Once the handshake is completed, if the lost connection was due to a momentary glitch, the standby switch goes into recovery mode. If the switch is experiencing a failure, the other switch goes into recovery mode and assumes the active role.

DAD provides another connection in a switching topology for confirmation. Before failing over to the second switch, it verifies that the first switch is down versus experiencing a slight and momentary glitch.

Symmetric Early Stacking Authentication

Symmetric Early Stacking Authentication (SESA) is a security mechanism for BIPC and Remote Sync (RSYNC) traffic in Catalyst 9000 series switches. It encrypts and decrypts all the remote inter-process communication in Cisco Catalyst 9000 products to guard against any hacking attempts. SESA works with Stack Manager, StackWise Virtual Link, and wireless and is Federal Information Processing Standards (FIPS) compliant.

When one Catalyst 9000 series switch interacts with another, SESA authenticates the second switch before linking to it as a standby. SESA keys need to be present on the new switch to enable valid authentication. The keys are periodically changed (e.g., every 10 minutes) and the information is sent to all connected nodes.

Extended Fast Software Upgrade

It used to take 6 to 7 minutes to reload software on Cisco switches. With Extended Fast Software Upgrade (xFSU), Cisco engineers have gotten the process down to 30 seconds or less. The traffic keeps flowing as the fast reload is in process. The hardware is never powered off and the control plane is maintained in an operational state.

When the system comes back up, it contacts the hardware and requires only 30 seconds to reprogram it. The timeframe increases with additional hardware, but it still is much faster than before xFSU was available.

Graceful Insertion and Removal

To perform troubleshooting or upgrades, network administrators sometimes need to manually remove one active switch or router and replace it with a standby. To do so, the Graceful Insertion and Removal (GIR) function was created. GIR notifies the protocols of both devices that they should be in maintenance mode but not shut off or disconnect from the network. Traffic is diverted during the maintenance window.

When the active node goes back into production, it doesn't have to recreate the sessions it missed. The objective is to minimize traffic disruption both when it is removed from and re-inserted back into the network, another feature that contributes to HA.

Figure 2. Graceful Insertion and Removal

In-Service Software Upgrade

With the in-service software upgrade (ISSU) feature, Cisco customers using platforms offering redundancy can avoid disruptions from image upgrades. ISSU orchestrates the upgrade on standby and active?processors?one after the other and?switches between them so that there is zero effective downtime and zero traffic loss. The active switch's control plane is always up.

The IOS XE software stack has the capability to do ISSU between any-to-any releases?and the development team has an elaborate feature development testing and governance process to ensure this happens without failures. Cisco defines policies for a smooth ISSU experience based on platform and releases combinations. Customers using the Cisco DNA center can use these policies for a smooth and non-disruptive ISSU experience.

Hot Patching

To speed up the process and lower the complexity, Cisco issues small micro images containing only the code necessary for a critical bug or security fix. Customers can install it on devices in?a?fraction of a second using hot patching without any network?disruption. Hot patching doesn't result in a device reload and the fix takes effect immediately.?Because of the small size of the patches, they are easy?to?distribute. Because of their limited content, customers can have much higher confidence?in?installing these micro patches?in their production network without going through the complete validation process.

The?hot patching feature is a toolchain of integrated technology and is expected to provide a default hitless defect fix.

Stay tuned for coming Cisco IOS XE features that enable HA across clusters of devices in different geographies!

Additional Resources:

Accelerate and Simplify -Guiding Principles in the Design of New Software Image Upgrade and Patching Solutions

Cisco IOS XE -Past, Present, and Future

How IOS XE Developers at Cisco Work Remotely and Cohesively on a 190-million-line Code Base

Native or Open-source Data Models? Use both for Software-defined Enterprise Networks

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

SERVIDORES

NOTÍCIAS QUENTES

S5735-L48T4S-A-V2: Comprehensive Review and Technical Overview

Best 10Gb Switch for SMB in 2025: Unlock Next-Gen Network Performance

S5735-L48T4S-A: Complete Guide with Features, Specifications, and Benefits

S5735-L48P4X-A1: Reliable PoE+ CloudEngine Switch

S5735-L48LP4XE-A-V2: Scalable, Secure, and PoE-Ready for Demanding Enterprise Deployments

S5735-L48LP4S-A-V2 Powers Smarter Campus Networks with Advanced PoE and Cloud Management

S5735-L24T4X-A1 Empowers Installers with Scalable, Reliable, and Efficient Network Access

Best Ethernet Switches for Business (2025): Selection Guide and Top Picks

Huawei S5735-L24T4S-A1: A Compact, Stackable Access Switch Built for the Future

Huawei S5735-L24T4S-A: High-Performance Stacking Meets Zero-Noise Deployment

S5735-L24P4XE-A-V2: Huawei’s Smart Choice for High-Density Campus Deployments

S5735-L24P4X-A1: Huawei’s High-Performance Access Switch Redefining Campus Networking

Huawei S5735-L24P4S-A1 Review: Reliable Gigabit Access with Enterprise-Grade Features

What Is an Orthogonal Architecture?

Huawei s5735-l24p4s-a-v2 Delivers Scalable, Secure, and Smart PoE Access for Modern IT Infrastructures

Huawei S5735-L48T4XE-A-V2 Switch Delivers Enterprise-Grade Performance in a Compact Design

Huawei S5735-L48P4XE-A-V2 Review: Versatile Campus Switch with iStack and Full L3 Support

Differences Between Huawei CE Series and S Series Switches

Huawei CloudEngine S5735 Switches Set the Benchmark for High-Performance, Energy-Efficient Switching

Huawei CloudEngine S5731‑S48P4X Datasheet

Huawei CloudEngine S5731‑S24P4X Datasheet

Huawei S5731-S Empowers Next-Generation Campus Networks with Advanced Capabilities

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Deep Dive into High Availability Features in Cisco Enterprise Devices

Part 2 of the 3-part High Availability Series

Operational Data Manager

Redundancy Management Interface

Dual Active Detection

Symmetric Early Stacking Authentication

Extended Fast Software Upgrade

Graceful Insertion and Removal

In-Service Software Upgrade

Hot Patching

Tags quentes : Catalisador Cisco Cisco IOS XE cisco catalyst 9000 Alta Disponibilidade Atualização de software em serviço hot patching Empilhamento

Ordering Guide

Recursos

Quem somos

Huawei CloudEngine S5731‑S48P4X Datasheet