December 1, 2016

Why Testing and Debugging Networks is so Difficult

by Charlie Elliott

The network lies at the heart of a modern enterprise’s ability to perform its daily business and operations. When a network outage occurs, due to a policy misconfiguration or a device failure, business grinds to a halt. Almost every week, it seems, we read a new headline where a Fortune 500 company suffered the catastrophic consequences of a network outage. These incidents are costly, causing revenue loss and impacting corporate reputation and customer loyalty. In the most extreme cases, outages have triggered both a company bankruptcy and a CEO’s dismissal.

Yet, despite the costs and frequency of outages, operating a network remains a manual, error-prone process. In an oft-cited study by Gartner analysts Ronni Colville and George Spafford, the authors note that “80 percent of network outages are caused by people and process issues, with more than 50 percent of those outages caused by change configuration issues.”[1] The reality is that a company’s network team may be one or two changes away from causing a severe outage, even when the network has hardware redundancy. Hiring more people won’t solve the problem either.

A primary reason for this fragility is that networks are inherently hard to test. In the world of software development, an abundance of testing frameworks and continuous integration servers help to ensure that code is correct, while an abundance of troubleshooting and debugging tools help to resolve problems when they appear. In networking, there simply isn’t a modern and comprehensive toolset for testing the correctness of multi-vendor device configurations and policies.



To start, the scale and complexity of today’s modern networks is simply daunting. Not counting servers, the network of a Fortune 500 company is typically comprised of thousands, if not tens of thousands of hardware devices (i.e. switches, routers, load balancers, and firewalls). Add virtual switches inside virtualized servers or for containers, and this number can grow radically larger. Each device can have thousands of rules determining how to forward and process packets. The emergent interactions of this enormous amount of distributed state defines network behavior yielding a degree of complexity that no human can grasp, let alone test and troubleshoot. Furthermore, this complexity has historically exceeded what silicon-based systems can handle.

So today, when an outage occurs, network teams turn to simple tools like ping, traceroute, or netflow in an attempt to map the symptoms back to the actual root cause. The most common approach is to log into devices, box-by-box, inspect via the CLI, attempt to infer behavior, and then mentally join it all together to divine the root cause of the problem. Such a manual approach is not only time-consuming, but infeasible as networks grow in size and complexity. Most importantly, such a method of troubleshooting is inherently reactive. The operator can know of a problem to fix only after the symptoms appear, by which time the customer is already experiencing the damage.


What about SDN? Does it eliminate this problem?

It is true that SDN in its purest form can bring some order to the chaos by providing a single logically centralized source of policy and configuration. It can also provide a clear abstraction and standardized representation of network configuration and state. However, instead of humans making changes at human timescales, SDN and network automation enable changes to occur at software speeds.

With potentially thousands of changes every hour, what happens when the network goes down? How do operators troubleshoot problems or outages in a constantly evolving network where they’re not triggering most changes? In a modern stack with multiple new vendors, which specific component is at fault? Did the network even make the mistake, or did it receive the wrong input from the operator?

With SDN and automation, the need to make sense of complexity has not gone away. The ability to make more frequent changes simply amplifies the need for new tools and approaches to provide visibility and help with troubleshooting.

The problem of network assurance is business-critical for legacy as well as SDN environments. Forward Networks has taken on the monumentally ambitious challenge of making networks as testable as software. After years of work, the Forward Platform was recently unveiled to transform the way organizations test, debug, and troubleshoot their networks.

We encourage you to view our online demo or sign up for a Free Trial.


  1. Ronni J. Colville, George Spafford, Top Seven Considerations for Configuration Management for Virtual and Cloud Infrastructure, (Gartner RAS Core Research Note, 2010)

Subscribe to our blog!

February 6, 2023
Visit Stand E08 at Cisco Live EMEA

Let the Games Begin! Cisco Live Amsterdam has officially started, and we’re delighted to be here meeting with the best and brightest of the European networking community. Stop by to say hello, and play Forward Quest to learn how easy it is to put your people back in charge of the network and register to […]

Read More
January 25, 2023
MSD Partners Leads Forward Networks $50M Series D Funding

Following 139% year-over-year growth, Forward Networks closed $50M in series D funding. The round was led by MSD Partners with support from new investors, Section 32, and Omega Venture Partners. Demonstrating ongoing support, existing investors Goldman Sachs Asset Management (Goldman Sachs), Threshold Ventures, A. Capital, and Andreessen Horowitz participated in the round. Since its last […]

Read More
January 18, 2023
Forward Networks to Host Cloud Field Day 16

I don’t know which is more exciting: the fact that there’s no rain forecast for the next two weeks or that we’re hosting Cloud Field Day 16 at the Forward Networks headquarters in Santa Clara, CA. It’s a nice dose of synchronicity that we get a break in the rain to dry out and clean […]

Read More
crossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram