SEE DEMO

December 1, 2016

Why Testing and Debugging Networks is so Difficult

by Charlie Elliott

The network lies at the heart of a modern enterprise’s ability to perform its daily business and operations. When a network outage occurs, due to a policy misconfiguration or a device failure, business grinds to a halt. Almost every week, it seems, we read a new headline where a Fortune 500 company suffered the catastrophic consequences of a network outage. These incidents are costly, causing revenue loss and impacting corporate reputation and customer loyalty. In the most extreme cases, outages have triggered both a company bankruptcy and a CEO’s dismissal.

Yet, despite the costs and frequency of outages, operating a network remains a manual, error-prone process. In an oft-cited study by Gartner analysts Ronni Colville and George Spafford, the authors note that “80 percent of network outages are caused by people and process issues, with more than 50 percent of those outages caused by change configuration issues.”[1] The reality is that a company’s network team may be one or two changes away from causing a severe outage, even when the network has hardware redundancy. Hiring more people won’t solve the problem either.

A primary reason for this fragility is that networks are inherently hard to test. In the world of software development, an abundance of testing frameworks and continuous integration servers help to ensure that code is correct, while an abundance of troubleshooting and debugging tools help to resolve problems when they appear. In networking, there simply isn’t a modern and comprehensive toolset for testing the correctness of multi-vendor device configurations and policies.

 

Why?

To start, the scale and complexity of today’s modern networks is simply daunting. Not counting servers, the network of a Fortune 500 company is typically comprised of thousands, if not tens of thousands of hardware devices (i.e. switches, routers, load balancers, and firewalls). Add virtual switches inside virtualized servers or for containers, and this number can grow radically larger. Each device can have thousands of rules determining how to forward and process packets. The emergent interactions of this enormous amount of distributed state defines network behavior yielding a degree of complexity that no human can grasp, let alone test and troubleshoot. Furthermore, this complexity has historically exceeded what silicon-based systems can handle.

So today, when an outage occurs, network teams turn to simple tools like ping, traceroute, or netflow in an attempt to map the symptoms back to the actual root cause. The most common approach is to log into devices, box-by-box, inspect via the CLI, attempt to infer behavior, and then mentally join it all together to divine the root cause of the problem. Such a manual approach is not only time-consuming, but infeasible as networks grow in size and complexity. Most importantly, such a method of troubleshooting is inherently reactive. The operator can know of a problem to fix only after the symptoms appear, by which time the customer is already experiencing the damage.

 

What about SDN? Does it eliminate this problem?

It is true that SDN in its purest form can bring some order to the chaos by providing a single logically centralized source of policy and configuration. It can also provide a clear abstraction and standardized representation of network configuration and state. However, instead of humans making changes at human timescales, SDN and network automation enable changes to occur at software speeds.

With potentially thousands of changes every hour, what happens when the network goes down? How do operators troubleshoot problems or outages in a constantly evolving network where they’re not triggering most changes? In a modern stack with multiple new vendors, which specific component is at fault? Did the network even make the mistake, or did it receive the wrong input from the operator?

With SDN and automation, the need to make sense of complexity has not gone away. The ability to make more frequent changes simply amplifies the need for new tools and approaches to provide visibility and help with troubleshooting.

The problem of network assurance is business-critical for legacy as well as SDN environments. Forward Networks has taken on the monumentally ambitious challenge of making networks as testable as software. After years of work, the Forward Platform was recently unveiled to transform the way organizations test, debug, and troubleshoot their networks.

We encourage you to view our online demo or sign up for a Free Trial.

 

  1. Ronni J. Colville, George Spafford, Top Seven Considerations for Configuration Management for Virtual and Cloud Infrastructure, (Gartner RAS Core Research Note, 2010)

Subscribe to our blog!

RELATED FORWARD CONTENT 
June 23, 2022
What’s it Like to work at a Great Place 

Forward Networks just was named a Great Place to Work in the Bay Area by Fortune. We’re excited to be on this list in an area known for setting the bar when it comes to treating employees well. Perks are nice, but perks alone don’t get 100% of employees to say they work at a […]

Read More
June 20, 2022
Forward Networks named one of the Best Workplaces in the Bay Area™ in 2022 by Great Place to Work® and Fortune Magazine

SANTA CLARA, Calif., June 20, 2022 /PRNewswire/ -- Great Place to Work and Fortune magazine have honored Forward Networks as one of this year's Best Workplaces in the Bay Area. This is Forward Networks' first time applying for and being named to this prestigious list. Earning a spot means that Forward Networks is one of the best companies to work […]

Read More
June 7, 2022
Forward Networks Wins 2022 Fortress Cyber Security Award

SANTA CLARA, Calif., June 7, 2022 /PRNewswire/ -- The Business Intelligence Group today announced that Forward Networks has won the 2022 Fortress Cyber Security Awards in the Network Security category. The industry awards program sought to identify and reward the world's leading companies and products that are working to keep our data and electronic assets safe among a growing […]

Read More

Sign up for our newsletter

clockcrossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram