Day 2 Network Operations: Nine ways to reduce fatigue
Pavan Basetty, Cloud Analytics Architect at WiteSand
Due to a global workforce, IT teams are challenged to manage complex enterprise networks across many geographical locations. The use of traditional tools and strategies is no longer sufficient to maintain the security and resilience of such networks. In day-to-day operations, teams rarely change the network configuration, but troubleshooting and operation of the network infrastructure is what causes the most headaches.
Even though AIOps improve operational experience, here are some ways to prevent problems in the first place.
Avoiding Manual Configurations
Let’s take the example of a VLAN mismatch between an access point and a switch access port. Why don’t networks detect when APs are connected and VLANs are automatically configured on both ends? Since this is common knowledge, it should be easy to automate.
#1: Utilize automated tools to provision consistent configuration settings instead of leaving room for manual error.
Best Practices Configurations
Do you agree that interconnected links between switches should be configured as port channels? Are you also in favor of turning on port security features on every access port? As a networking expert, you know many of these best practices.
#2: Use tools that can automatically discover and provision best practices configuration. This tool should also allow you to handle exceptions.
Consistent Policies Across All Locations
The policies for your employees, guests, and IoT devices remain the same regardless of where they connect from. Unless required for compliance or other reasons, it is best for each location to be managed uniformly.
#3: Aim to create one (or a few) set of policies that are consistently applied to all locations to prevent custom configurations.
Zero Trust Policies That Are Not Affected by the Network Identifiers
In order to secure wireless and wired networks, zero trust requires devices and users to be fingerprinted, authenticated, and authorized. Therefore, can you construct policies such as how the user “John Doe” is treated in the network? How are IoT cameras admitted to the network?
#4: Consider tools which are able to define intent-based policies instead of being tied to low level constructs like VLAN IDs, IP addresses, switch ports, etc.
Managing Diverse Hardware
Enterprise customers use a variety of models or even a mix of vendors for their access points and switches. It can happen due to organic growth, newer models in newer locations, or even multi-vendor strategies. However, a single vendor typically does not provide a single tool to manage all of their models; multi-vendor management is a dream today.
Consider, for example, Meraki is in one building but Cisco 6K, 9K, and Aruba are in another. Access points made by Aruba or Mist, but switches made by Cisco? We have seen many combinations of Meraki, Cisco, HPE Procurve, Aruba, Juniper, Mist, Arista, and other vendors deployed across various locations by enterprise customers.
Enterprise customers are able to handle multi-vendor pain today by using Ansible scripts; shouldn’t an automated tool be able to as well?
#5: Seek out a vendor tool that can manage across diverse hardware, rather than every enterprise having to deal with multi-vendor complexity.
Real-time Handling of Alerts
It is critical to detect and send alerts to Slack, Microsoft Team, or any other communication tool as soon as possible. Furthermore, the need for a system that is able to coalesce and aid in determining root causes should also be considered.
#6: Use a tool that identifies the context of what changed and accesses analytics data to reduce alert fatigue and help identify the root cause quickly.
Access to Network Sensor Data
To prepare for manual or automated troubleshooting after a problem has occurred, you need access to all kinds of signals such as syslog, audit logs, configuration data, monitoring data, and network flow records.
#7: Avoid using too many consoles to gather network sensor data. The goal is to find a well-integrated tool to reduce time spent on troubleshooting.
Tools that Automate AI/ML, Statistical Analysis, and Rules Based Engines
AI/ML is used in AIOps to automate, monitor, and correlate IT operations data. A statistical model, on the other hand, can help establish relationships between several simple variables. Moreover, a rules-based engine allows you to convert procedural approaches to debugging into rules.
Automated tools can quickly determine whether a user’s network problem is related to him, to a specific location, or to a particular operating system or network device. This is an example of using rules and statistics to analyze data.
Baselining network SFP sensors and relating them to failures of the network, while predicting the failures beforehand, is an example of what ML-based tools can accomplish.
#8: Look for a tool which combines AI/ML, Statistical analysis, and Rules-based correlation to improve outcomes.
A Holistic View of Inventory, Visibility, and Health
Do you know all of the assets connected to your network? Are you able to quickly detect an issue regardless of where users connect from?
#9: Identify a tool that provides visibility into all activity on your network, all endpoints, and the health of all devices across global locations.
Over the past 30 years, the WiteSand team has innovated many networking technologies and consistently delivered on its promises. The WiteSand Cloud Suite has integrated all of the above elements into its core architecture, rather than treating them as an afterthought.
For a live demonstration of the WiteSand SaaS platform, contact us here.