SEE DEMO

April 27, 2017

Seeking Truth in Networking: From Testing To Verification

by Brandon Heller

Sharp network admins already verify the network in a variety of ways, right? Pings, traceroutes, and custom scripts verify expected connectivity. Link and CPU utilization monitoring programs verify normal operation. Maybe pushed configs are read back in to verify that the device accepted them. And isn’t verification just another term for testing, anyway?

No. Just like data ain’t knowledge, testing ain’t verification!

In this blog post, you’ll see how stories from other technology domains (like semiconductors and software) point toward verification as a natural next step in any critical environment, especially one like networking.

Learning from history: verification in other domains

Computer Hardware

Manufacturing the next great computer chip begins with producing a mask set – a stack of extremely detailed circuit patterns – which is then used to etch circuits onto a silicon wafer. If a mistake slips into the circuit design and is discovered in the lab, a new mask set is required. This is a big deal! A single bad mask set can cost millions of dollars and delay a product’s introduction.

But even worse is when a bug escapes the lab and customers find it. Remember the Intel Pentium FDIV bug? A rare but easy-to-replicate math error would corrupt data. Recalling and replacing the affected processors cost Intel nearly $500M.


It’s not as if Intel didn’t test their chip in advance. Skilled engineers wrote unit tests to confirm the expected behavior of their own circuits, created integration tests to confirm the behavior of entire modules, and ran extensive test patterns in the lab. But it’s extremely hard to think of every test worth running.

What if – instead of having more people write more tests and yet still have to cross their fingers – we could write a program that would know whether the chip logic was correct, to avoid any errors like FDIV corruption?

Such programs exist under the category of verification. One particular verification technique, called model checking, could have helped avoid the FDIV error. It works by representing circuit states in a graph and checking all possible transitions; the trick is to do this fast enough and on large enough circuits. In fact, this development was so important that it led to a Turing award for its creators.

Verification techniques can’t prevent chip fabrication errors and won’t design the chip for you, but they provide extremely valuable assurance that you can’t get any other way.

Computer Software

In the software world, correctness is just as important, especially when a bug has real real-world effects. Think software that operates an airport baggage systemcontrols a radiation dosage, or guides a rocket into space, like on June 4, 1996, when the Ariane 5 rocket made its maiden voyage to space.

Thirty-seven seconds into the flight, the rocket exploded.  Why?  To determine the rocket’s position, an inertial measurement unit adds up sensor values over time.  When this code path experienced an integer overflow – but had no way to recover from it – the rocket lost control.

The ESA extensively tested their software in advance and even re-used known-good guidance code from the Ariane 4.  But this rocket followed a slightly different flight trajectory, and the engineers had missed testing with sensor inputs that would have matched the rocket’s actual flight profile.

“Testing shows the presence, not the absence of bugs”
Dijkstra

The standard approach to testing software is familiar – test code by itself, test it with other pieces, and then test with realistic inputs to all the pieces bundled together.  Beyond this, many developers employ coverage tools to measure the completeness of their testing, to confirm that all lines of code are actually tested. But in all likelihood, even coverage testing would not have revealed this software fault; even systematic full-line-coverage testing can miss key input sequences.

What if a program could trace through every execution path in a program, for every possible input, to confirm the absence of this bug?

Software verification programs exist, and Ariane 5 became a poster child for them.  One example, static analyzers, verify that entire classes of problems are absent from a piece of software, including data leaks, race conditions, infinite loops, and – relevant to not just rocket builders – unhandled overflows.


Network Testing Today

A single network testing error can have real-world consequences, too.  Remember when a network configuration change brought down Amazon EBS and much of the Internet?  Just look through the news for examples of planes, trains, and stock-trading terminals brought to a standstill by network issues.  Why are we not doing enough to keep companies off the front page?

Limitations of today’s testing and how to move past them

Partial Coverage vs Complete Coverage

Network testing today is partial.  We run pings, traceroutes, and maybe custom monitoring tools, but do these tests accurately reflect all the real paths used by all the applications?  You don’t know and you can’t even measure how much you’re missing.

Indirect Evidence vs Direct Proof

Network testing today is indirect.  We check utilizations and counters against thresholds, but is a real problem present?  Over-threshold links, ping failures, and non-zero drop counters can trigger for normal transient behavior, while under-threshold links may actually be unavailable for application traffic.  It’s important to not just identify that the network may be broken, but point out where and why so you can actually fix it.

Single-element view vs Whole-Network View

Another common tactic is device-level conformance checking.  An auditing tool gives a big thumbs up to a configuration change to shift traffic, because it matches the master template. It’s better than nothing, but it doesn’t answer your question: will the network behave, end-to-end, as it should?


The Aha Moment: Verifying Networks

At this point, your spider-sense should be tingling. When other industries experienced critical, front-page errors, it spurred the development of new ways to address the limitations of human-driven testing. We’re now in a world where every business is becoming a software business and the network is a crucial enabler.

Other industries acted. We can act too. We’ll focus on one specific technology.

Network Data-plane Verification provides a way to test that the intent of the network matches its implementation, as gleaned from forwarding state. For example, you can verify that all hosts in a subnet can reach each other on any port, despite being spread across multiple data centers.  You can confirm general properties, like freedom from loops.  You can even use this to catch future-outage-causing issues, like both sides of a link having different VLANs defined. Unlike today’s testing, it is comprehensive, it is direct, and it analyzes end-to-end behavior.

Why did it take so long to get here, when other industries had verification tools 20 years ago?

  • One possible answer: Systems that aren’t clearly defined. RFCs and vendor manuals are not formal, unambiguous specifications; interactions between protocols are frequently undefined.
  • A second possible answer: Overwhelming diversity. Who has the time to model the multi-dimensional matrix of protocols, OSes, devices, and versions in every modern network?
  • A third possible answer: Modern networks are huge. Any analysis must scale to an enormous number of potential paths and packets.

We know the answer – all of the above, and more! – because we’ve lived these challenges for the last few years to bring the Forward Platform to market.  Ambiguity and diversity are addressed by careful testing, and now we support a majority of the world’s networking equipment. For scaling, like in other verification contexts, the key is to build a model that captures important behavior details, but abstracts away the rest; we represent packets with wildcards, using the Header Space Analysis framework.  A range of practical challenges pop up too; I highly recommend A Few Billion Lines of Code Later, which describes these for static program analysis.

At Forward Networks, verification is at the core of what we do and how we think about networking.

Traditional testing will always have its value, and verification is not a panacea for all network issues.  But it offers something uniquely valuable: confidence that your network actually works as intended.  

I hope this blog gave you a glimpse of that, and introduced some fascinating studies in engineering failures in the process.

Many thanks to those who contributed to this post, including Andi Voellmy, Nikhil Handigol, and Siva Radhakrishnan.

Learn the modern way that companies stay off the front page

Experience a demo of the Forward Platform

Subscribe to our blog!

RELATED FORWARD CONTENT 
May 10, 2022
In Case You Missed It …

ONUG Spring 2022 is in the books. What a great event! Being able to meet with networking experts in person feels like such a treat after everything we’ve been through. The best thing about ONUG events is the cornucopia of informational sessions. Even if you were there, you probably didn’t get to go to all […]

Read More
April 20, 2022
If you are concerned about Cloud Security, visit us at ONUG Booth 43.

Spring is in the air and that means that ONUG Spring is right around the corner! At Forward Networks, it’s feeling a little like Christmas in April because we’re so excited to meet in-person, and we hope you feel the same. Our booth is polished, our presenters are on fire, and our capabilities for solving […]

Read More
March 30, 2022
How to decide if a network digital twin is right for your company – Consider these ten questions

Interest in digital twin technology is on the rise, likely driven by the pressure placed on IT teams to ensure that their networks are predictable, agile, and secure. Network and security operations teams are actively investigating how implementing a digital twin can help their teams become more proactive and provide confidence that the network will […]

Read More

Forward Networks

Mathematically-accurate network modeling trusted by the world's largest networks.
CONTACT SALES
crossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram