As companies are trying to become more agile to provide new services to their customers at a faster pace, more and more of them are embracing Network Automation in some shape or form.
Network Automation can be very powerful, but it comes with a price: it’s frighteningly dangerous! A mistake in the automation toolchain can bring down an entire organization! Well, you might argue that a mistake made by an engineer typing the wrong command in a single device SSH session can have a similar effect. That’s actually true, but usually the impact of a change made via automation can be an order of magnitude bigger than a manual change as it can potentially propagate the same error across each and every device in the entire network. Think about changing or deleting the wrong VLAN ID…scary, uh?
On the other hand, how much time would it take to make the same change manually, device by device? Automation is the only way possible to apply quick changes to a dynamic network environment.
Lately we hear more and more about Intent-Based Networking (IBN), from leading research and advisory companies, as well as incumbent networking vendors. The main idea behind IBN is to define how the network should behave at a high level without providing any detail about the implementation, like device vendor and related specific configurations. The idea is actually not entirely new. If you take the configuration management tools as an example, they have been using the same concept for applications, services, and networks for more than a decade now.
IBN is all about automation! A well-defined IBN solution should provide automation for both Network Configuration and Verification.
While there are already several very good network configuration automation solutions in the market, there are not as many options for feature-rich, scalable, accurate and open Network Verification solutions.
Why? The answer is very simple. It’s freaking hard to build one!
These are the usual steps involved in implementing any sort of network verification:
Whereas the collection phase is usually straightforward, it needs a well thought out architecture to make it scalable and fast.
Parsing and normalizing the data are by far the most challenging and time-consuming phases! Having to deal with a broad range of devices like switches, routers, load balancers and firewalls, across many different vendors, usually running different OS versions, on-premises and in the cloud, poses a very significant burden for engineers.
Usually, the data collected is plain text from the device configuration and several “show commands” outputs to gather the device state. As such, the data is completely unstructured making the parsing extremely complicated especially for the device state due to the lack of proper documentation and constant, unpredictable change from one software release to another.
What about device-level APIs? NETCONF? YANG? OpenConfig?
The very reason why the entire networking industry is moving toward standard-based network APIs, is exactly to provide well-defined structure data to make the device configuration and state parsing easier. The final goal is to enable faster and easier configuration and verification automation. That’s the future, no doubts about it, with YANG as a data modeling language and OpenConfig as the vendor-agnostic data model definition likely to be predominant in the mark for the foreseeable feature.
The reality is that, as of today, network device APIs are available on a limited number of devices, on bleeding edge software releases, very often covering only a subset of the features with very limited support for vendor and OS-agnostic data models like OpenConfig. This unfortunately forces the engineers to stick with CLI outputs only.
What if the data was already collected, parsed, normalized and made available in a similar fashion as OpenConfig? What if you could query your network like a Database? That would be a dream coming true for most of the engineers, right? This would allow the network teams to focus on the higher-level aspects of their use cases, which is actually making their networks more resilient, agile and robust without spending time writing collectors and parsers. That would be just amazing!!
Well, that’s exactly what the Network Query Engine (NQE) from Forward Networks does! Specifically, it provides an open platform for accessing structured data about the network as JSON data, in a fully-parsed form. Moreover, the information is normalized, it is presented uniformly across all the supported vendors in a scalable, easy, maintainable way. This means that the same sanity, verification and documentation checks can work across many different devices and vendors.
To make NQE easy to consume at scale, Forward Networks has made all the data queryable through a GraphQL API. GraphQL is a flexible data query language developed by Facebook in 2012 and released as an open-source project in 2015. Hundreds of organizations are already leveraging GraphQL.
Moreover, NQE is aligned with OpenConfig. It’s not literally the same data model as any of the various JSON representations of the OpenConfig YANG data models. Rather, Forward Networks NQE can be seen as an idiomatic representation of OpenConfig as a GraphQL data source. It sounds complicated but it’s not. In NQE the OpenConfig YANG data models have been modified to fit within the constraints of GraphQL and to take advantage of the powerful query features available in GraphQL. To give you an example, names are camel-cased, dashes are not permitted in GraphQL.
Last month we introduced our Network Query Engine (NQE) at Cisco Live Europe and to a very impressive technical audience as part of Tech Field Day 2019. If you didn’t have the chance to read through our introduction blog, NQE leverages the internal network data model that Forward Networks builds and manages to allow users to query their network infrastructure details like a database. These queries can be quickly built to confirm network health, proper configurations, effects of a change, device or interface status, etc. A few representative queries that customers have described to us and that are now possible include:
By viewing all network details as a data source, users are able to query on issues globally across their entire network, looking for any anomalies, in one quick sweep. This has rarely been possible before, without an enormous amount of usually custom effort. The alternative is to check for conditions at each device, one at time, across a large network. Scripts that automated these kinds of custom checks across network devices are very tedious to develop and maintain, especially across different vendors and device types. Forward Networks now makes it easy to build queries in only a few minutes, based on the normalized, vendor-neutral data model in our platform, with a very flexible new query language, GraphQL.
GraphQL was developed by Facebook and turned into an open source project in 2015. It offers enormous flexibility in defining what information is returned, independent of the data model, making it much more efficient for almost every use case than typical interface APIs. GraphQL query statements are natural to embed in programming or scripting languages, like Python, to further compare or analyze the extracted data, or format the results.
Now See the Demos
But, the best way to get a handle on how NQE works is to see a quick video we built that explains how it can be used inside our Forward Enterprise platform, how a sample query is built and how the information can be leveraged. Check out the short demo below:
A lengthier and more technically advanced use case was presented as part of Tech Field Day. Our lead NQE engineer, Andreas Voellmy, shows how we can compare BGP routes in downstream and upstream routers to confirm they were all exported correctly as advertised. This situation actually caused a severe outage at one of our service provider customers, so they wanted to be able to continually check for this scenario. To be able to programmatically verify this across an entire SP network, with many vendors, on a daily basis is a huge time saver and eliminates future errors for them now. Check out Andreas’ demo that replicates their use case here:
“For years organizations have been trying to extract value from the data available to them in large complex network environments. Unfortunately, manual efforts and inefficient collection and normalization procedures have held them back. Fortunately, Forward Networks has unlocked the ability to quickly, easily and programmatically convert network data into knowledge and actionable information leveraging its Network Query Engine feature.” - Bob Laliberte, ESG
Network IT engineers realize that NQE gives them a really accelerated approach to automate almost any of their network analysis and health status checks. Our platform provides many useful ways to analyze the network end-to-end, but NQE allows customers to query the collected and normalized data in thousands of ways and use cases that we didn’t design for.
A few final quick points to know:
Want to learn more or get a live demo? We’ll show you how NQE can help accelerate your networking tasks and processes in minutes.
There is a wealth of untapped information in your network. Learn to query and extract it!
Even before packets start flowing, enterprise networks are complex, data-intensive repositories of topology, configuration and state information. This information is often required to solve operational issues—like finding sources of unwanted traffic drops or protocol configuration errors—or, to find problems before they become issues.
Yet, this valuable information typically goes untapped, because getting at it requires too much work. The data is scattered across devices and stored in different formats, with disparate methods to access it.
Forward Networks' newest capability, Network Query Engine (NQE), removes the burden of collecting, parsing, and querying network information. NQE takes configuration files and state information from across all your network devices and exposes it in a well-defined schema that can be queried like a database—to enable a new range of network management capabilities and insights. Read on to see how it works and how you can get started with it, today.
Simple questions are surprisingly hard to answer
Today, answering even basic questions about a network can be challenging and time-consuming. Consider this simple task: find all device interfaces in your network whose operational status does not match their configured status. For example, if an interface is configured to be UP, but is operationally DOWN, we want to know about this!
To implement this simple check, we need to 1) log in to all of our devices, 2) use the available method to retrieve the data 3) extract out information we need from the data, and 4) put this information into some common data structure, so that the check can easily be written. In some cases, we may be able to use proper APIs, like SNMP or NetConf, to retrieve this data. In other cases—and in almost every significant real world case we've seen—some of the data on some of the devices has to be retrieved from the CLI interface of a device and parsed from human-oriented, ill-specified, vendor-specific textual output.
To illustrate, consider getting the operational and administrative statuses from all interfaces on a Cisco NX-OS device and on an A10 ACOS device. On Cisco NX-OS, we have to run two commands, "show running-config" and "show interfaces brief" to get at all the data. On A10 ACOS, we have to run "show interfaces". The outputs from these three commands, shown in Figures 2, 3, 4 are all completely different and we have to write 3 ad-hoc parsing functions to grab the data we want.
Developing programs to collect the needed data on one or two devices or platforms for one or two data points is not that hard. But doing this across all of your vendors, platforms, and OSes, for the thousands of data points (some rather obscure) that you may need, is an enormous effort.
When you can efficiently get answers, many operational tasks can be accelerated
Beyond the simple question about interfaces above, operators have expressed interest in answering a long list of questions:
Every network engineer we talked to has questions like these, which all require network-wide information to answer completely. In some cases, engineers would query manually and get a partial answer. In other cases, where they had tools teams or scripting-savvy network teams, they would invest the time and effort. One network operator who went down this path reported that 80% of the effort was spent in basic collecting and parsing of device and vendor-specific formats and details.
Unsurprisingly, nobody reported that this was work they'd want to do, because it's tedious to get collection, parsing, and querying right across vendors and all device types (switches, routers, firewalls, and load balancers), because it requires ongoing maintenance (as the devices evolve and new firmware versions come out), and because it adds risk (that the star coder who made this leaves). There are libraries to help with network-device collection, access, and parsing, but none get you all the way to the finish line. Existing tools/techniques like SNMP, HTTP APIs, and YANG outputs often only work on a subset of devices, require software upgrades that you are not ready to apply, or don't cover all the data you need.
Given the level of effort required, many questions are simply not answered, leaving operators blind to potential problems in their network or working long weekends hunting down the needles in their haystacks. Fortunately, now, help is on the way...
Leveraging the Forward platform
As part of the underlying engine of its network assurance and intent-based verification platform, Forward Networks already does the hard work to create a centralized, vendor-agnostic internal representation of all this network configuration and state information across all layer 2-4 device types. This information powers applications like search, verification, behavioral diffs, and more, all of which can be accessed via our GUI or API.
But this raw information could do other things—if exposed the right way—to answer questions like those above, as well as enable live queries, custom verification checks, compliance documentation, and new dashboards.
To this end, we are introducing Network Query Engine (NQE) in the Forward Networks platform. NQE provides access to normalized, structured data about the network, enabling network teams to focus directly on the higher-level aspects of their use cases, which is actually making their networks more resilient, agile and robust. Data queries for specific needs can now efficiently be written in a small fraction of the time, usually in a few lines of code or in a graphical query editor. The diagram below shows how we are exposing our core platform data model alongside our Forward Enterprise applications and intent-based verification analytical model.
Our key design goals for enabling this level of access to our structured, normalized network data model were to ensure the availability of data was:
In particular, we aimed to have a normalized data model, in the sense that device information (interface details for example) is represented with the same information structure, regardless of vendor, platform and OS. This normalization allows you to easily write queries and scripts that work across your entire fleet. We also wanted the data to be structured, in the sense that data is fully-parsed—you won't need to parse further into some string to extract out the embedded structure. Again, this simplifies programming because it eliminates a bunch of annoying details, like dealing with differences in textual representation of MAC addresses that arises across platforms.
We believe we've achieved these goals and invite the community to try it out. The data model is far from complete (more on that later), but feedback from operators convinced us to accelerate this feature and make it available as soon as possible.
GraphQL: A simplified and efficient query language to the Forward schema
NQE is based on GraphQL, and this is a crucial ingredient in meeting our design goals. GraphQL, which was developed by Facebook in 2012, and released as an open source project in 2015, is: "a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools." [graphql.org]
GraphQL lies somewhere in between a REST API and SQL: Like a REST API, GraphQL defines a format for requesting and receiving general data over a web connection. Like SQL, GraphQL defines a query language for requesting and receiving filtered data from a graph-structured data source that has a well-defined schema. GraphQL has gained the backing of hundreds of organizations like Intuit, Netflix, GitHub, and PayPal, and is getting rave reviews. Just to quote one: "at PayPal, GraphQL has been a complete game changer in the way we think about data, fetch data and build applications."
GraphQL provides some key ingredients for NQE. First, GraphQL defines a schema language that NQE uses to provide a clear, simple, and precise description of our network data model in the form of a GraphQL schema. This schema allows developers to know precisely what information is in our data model, how it will be structured, and what it means. For example, here is a fragment of the schema that defines the structure of the object that defines Ethernet attributes of an interface:
This declaration clearly communicates the structure with minimal fuss: it defines a record with a few fields, and each field has a well-defined type in our schema.
Second, GraphQL provides a simple way to write queries against the schema. In fact, the query language is so simple, it almost doesn't feel like a language: it looks like JSON, without the values. For example, here is a query for the macAddress and negotiatedPortSpeed values of the Ethernet datatype:
Beyond its simplicity, the query API is crucial to ease-of-use: as a developer, you don't have to cope with a deluge of data, which can complicate your task. Instead, you ask for—and get—just the small amount of data you need for your task. Compared to REST API, this can make for more resource-efficient and responsive applications.
Finally, the returned data is easy to consume. The returned data is a JSON object that directly follows the structure of the query; it is essentially, the same as the query, but with values attached to the requested fields. For example, the above query might return this JSON:
Answering a simple question is simple with Network Query Engine
Using NQE, we can now easily answer our original question about interfaces with mismatched operational and configured states. In particular, you can get all the data you need, across all devices in your network, with this short (and sweet) query:
And the output is immediately consumable:
That's it! Notice what you did not have to deal with: no collection and storage of data, no reading manuals to find out which command to run, no dirty regular expressions to parse the data, and no vendor-specific hacks. This single query works for all devices supported by Forward Networks!
Moreover, we've made it easy to take a query like the one above, and embed it in a script that integrates this sanity check into larger workflows. Specifically, we've open-sourced a simple Python client library that you can use to query the GraphQL API in any Forward Networks instance. For example, we can drop the above query into the following Python script that uses our client library to print out all interfaces that have different operational and configured states:
Running this program prints out something like this:
Not the prettiest thing in the world, but not too shabby either. In some 30 odd lines of simple Python, you've implemented a script that works on all devices, platforms, and OSes supported by Forward Networks (without needing to update any device firmware or install any agents) and checks a property that may be important for keeping your network sane.
NQE takes inspiration from OpenConfig, an informal working group on network management and operations standards, which includes a vendor-neutral network data model. NQE tries to align with that data model where possible. For more details on specifics of the alignment with OpenConfig, refer to our github README file.
Head over to our github repository to get started with Forward NQE. The repository shows you how to run queries and install the client library, provides a set of examples that you can use to bootstrap, and covers a variety of details about the API. An easy way to get started is to query your network in Forward Enterprise (account required). If you do not already have an account there, you can request an account here.
In particular, our repository guides you to our Network Query Explorer, which is a lightweight, interactive query editor and schema explorer in the Forward platform. The Network Query Explorer provides a simple interface for testing GraphQL queries; you can interactively query the network and get immediate results, as well as prototype queries prior to embedding them in a custom application.
The data that could be exposed about networks is vast. We've started with a small subset of this data, covering the following areas:
Please check out the repository, which has the NQE GraphQL schema in its documentation for full details. Or, check out this interactive, visualization of the NQE GraphQL schema:
We welcome your feedback on what data to include going forward. Please send us issues on our github repository to let us know about your use cases and the data you need.
We also invite you to use the github repository to share scripts and queries and collaborate with us on Network Query Engine. Feel free to fork the repository and send us pull requests.
Have fun querying and analyzing your network with our NQE! We're excited to see what you do with it!