Hack weeks and hack-a-thons are like foosball tables; if you don’t have them, are you even a tech company? These events, once revered for innovation, are now relegated to being blasé and often perceived as little more than playtime for engineers. As someone who’s worked in tech for longer than I care to admit, I had started to ignore them - until I came to Forward Networks.

I’ve just experienced my third Forward Networks Hack week - and what I find remarkable is that while teams are free to choose any idea they want to pursue, they always focus on delivering customer value and making life easier for the people who use the platform on a daily basis.

A cynic might say they do this because customer value is the most heavily weighted judging criterion. Or that they want to win a coveted HackWeek letterman's jacket and commensurate bragging rights. Those who listened to the presentations know better. Our engineers talk about our customers as close personal friends, with a level of caring I have not encountered before joining the Forward Networks Family.

Over half of the concepts presented in our Hack Week sessions quickly become new platform features. (We’re still a small start-up with a lean engineering team. Otherwise, I genuinely believe all the ideas would become part of the platform.) This is a testament to the passion we have for supporting our customers.

Other companies bill their Hack Weeks as time for employees to “experiment, create, test, and learn” or to “explore novel and even heretical ideas.” Engineers at established companies have described Hack Weeks as: "We’ll all be building things that are separate from our normal work and not part of our day-to-day jobs.” While this sounds fun and creates the opportunity to deliver valuable innovation, it’s unlikely to benefit customers in the short term.

Our Hack Weeks combine creativity and competition with real-world problem-solving. The collective goal is to deliver features that enhance our customer experience within a quarter.

I can’t go into detail about the specific projects that our engineers presented without teasing new product features, but I can say this: the underlying theme was delivering otherwise inaccessible data in an intuitive manner. The data uses include improving efficiency, preventing human errors, improving decision-making, securing the network, and reducing manual efforts.

Hack Weeks should be about solving your problems - not chasing rainbows.

Forward Networks is a winner in the 2022 Business Insights Excellence in Customer Service Awards. We are over the moon — because our customers are our top priority.

Our platform addresses unprecedented network complexity by collecting configuration and state data for all devices in the network and indexing it in a vendor-agnostic way. The platform collects data across physical, cloud, and virtual environments to help network, security, and cloud engineers make the network more predictable, agile, and secure by providing immediate access to actionable information.

Based on an advanced mathematical model, Forward Enterprise creates a network digital twin of an enterprise environment across on-premises devices and hybrid multi-cloud environments. IT teams can instantly troubleshoot, verify intent, prove compliance, and predict network behavior by computing all possible traffic paths.

Put more simply, we make hard stuff easy for IT engineers, so they can spend time on proactive projects. And we enjoy doing it! In a recent employee survey, one of our people put it this way: “The result of our work first and foremost helps customers in ways nobody could have imagined before; this motivates every person in our company.”

“We work with organizations whose networks cannot go down without significant consequences to the economy, public safety, and financial systems,” said Yadhu Govindarajan, Director, Customer Success, Forward Networks. “Each of these organizations are running highly complex networks and facing unique challenges that we help them address to ensure their networks are predictable, reliable, and secure. Winning this award is especially important to us, as it validates the impact of our commitment and effort.”

The Excellence in Customer Service Awards celebrate those who are winning by supporting their own customers and those who are developing the tools to help others find success. Awards were given out to consultants, outsource partners, and technology providers for superior performances in the past 12 months.

To learn more about the unique things we’ve done to support our customers, check out our customer value blog series by Yadhu.

What do we do?

At Forward Networks we build digital twins for computer networks. Enterprises with large networks use our software to build a software copy of their network and use that for searching, verifying, and predicting behavior of their network. It is not a simulation. It is a mathematically accurate model of their network.

Why is it a hard problem?

Large enterprise networks contain thousands of devices (switches, routers, firewalls, load balancers, etc). Each of these devices can have complex behaviors. Now imagine a large graph with thousands of nodes where each node represents one of these devices and the links between nodes show how they are connected. You need to model exactly how traffic originating from edge devices is propagated through the network.

To do so, you need to understand the exact behavior of each device in handling different packets. A typical enterprise network not only includes different types of devices (routers, firewalls, etc), but they are built by different vendors (Cisco, Arista, Juniper, etc) and even for the same device type from the same vendor, you typically see many different firmware versions. To build a mathematically accurate model you need to model every corner case and a lot of these are not even documented by vendors.

At Forward we have built an automated testing infrastructure for inferring forwarding behavior of devices. We purchase or lease these devices; put them in our lab; inject various types of traffic to them and observe how these devices behave.

Where are we today?

I’m proud to show off publicly today, for the first time, that we can process networks with more than 45,000 devices on a single box (e.g. a single ec2 instance). Here is a screenshot of an example network with about 45k devices:

Some of our customers send us their obfuscated data to help us identify performance bottlenecks and further improve performance. It is a win-win scenario. Our software gets better over time and they get even faster processing time. The data is fully obfuscated in that every IP and MAC address is randomly changed to a different address and every name is also converted to a random name and these mappings are irreversible. These changes do not materially change the overall behavior of the model and the obfuscated data is still representative of the complexity and diversity of network behaviors of the original network. The network in the above example is built from those data.

This network includes more than 10^30 flows. Each flow shows how a group of similar packets traverses the network. For example one flow might show how email traffic originating from a specific host and destined to another host starts from a datacenter then goes through several backbone devices and finally arrives at the destination data center. 

Each of these flows can be complex. If we were to spend 1 microsecond to compute each of these flows, it would still take us more than 10^17 years to compute this. But with a lot of hard engineering work, algorithmic optimizations and performance optimizations we are able to process this network in under an hour and we are capable of processing this on a single box. You don’t need a massive cluster for such computation. The best part is that the majority of the computation scales linearly. So, if customers want faster processing speed or higher search and verification throughput they can use our cluster version and scale based on their requirements.

How long did it take us to get here?

Forward Networks was founded in July 2013. Our founders are Stanford PhD grads and as a result the very first test data that we got was a 16 device collection from part of the Stanford network. I joined Forward in Sep 2014 after spending a couple of years building and working with large distributed systems in Facebook and Microsoft. I started leading the effort to scale our computation to be able to finish the computation of that 16 device network in a reasonable amount of time and it took us about two years to get there (Mar 2015).

Then almost every year we were able to process a 10x larger network. Today, we have tested our software on a very complex network with 45k devices. We are currently working on further optimization and scaling efforts and our projection is to get to 100k devices in Dec 2020. The following graph shows our progress our last couple of months and the projection till Dec 2020 on logarithmic scale:

Lessons learned

It takes time to build complex enterprise software

As I mentioned above, we started with data of a very small network. As we made our software better, faster and more scalable, we were able to go to customers with larger networks to get the next larger dataset; find the next set of bottlenecks and work on those. We had to rewrite or significantly change the computation core of our software multiple times because as we got access to larger data we would see patterns that we hadn’t anticipated before.

Could we have reduced the time it took us to get here if we had access to large data on early days of our start up? Yes. Was it feasible? No. Why would a large enterprise spend the time to install our software, configure their security policies to allow our collector to connect to thousands of devices in their production network to pull their configs and send the data to a tiny startup that doesn’t have a proven product yet? It is only going to happen if it is a win-win situation. Every time we got access to the next larger dataset from a customer, we optimized our software based on that and went to other customers that had networks of that size where our software was already capable of processing all or majority of their network and when they shared this data with us. We would either find new data patterns that needed to be optimized or combine all the data we had received from customers to build larger datasets for scale testing and improvements. It is a cycle and it takes time and patience to build complex enterprise software.

Customers with large networks typically have much more strict security policies which means that they wouldn’t share their data with us. This is why we had to spend the time and build data obfuscation capabilities in our software to allow them to obfuscate their data and share the result with us which would reveal the performance bottlenecks without sharing their actual data. Some customers have such strict policies that even that is not possible and for those we have built tools that aggregate overall statistics which are typically useful for narrowing down the root cause of performance bottlenecks.

When selling enterprise software, customers typically don’t spend a large amount of money on a software platform if they’re not 100% sure that it would work for them. There is typically a trial or proof-of-concept period where they install the software and evaluate it in their environment. In our early years, we worked very hard with our first few trial customers to make the software work well for them. There were cases which didn’t end up in immediate purchase but their data gave us invaluable insight in improving our software.

On-prem software should work on minimal hardware

These days it is pretty easy to provision an instance in AWS, Azure or other cloud providers with 1TB or more RAM. But you would be amazed to know how many times we have had to wait for weeks or months for some customers to provision a single on-prem instance with 128GB or 256GB RAM. Large enterprises typically allow provisioning small instances pretty quickly. But as soon as your software needs a more powerful instance, there can be a lot of bureaucracy to get it done. And remember, during the initial interactions with customers, you want them to start using your software quickly to finish the proof-of-concept period. During this time, they are still evaluating your software and they haven’t yet seen the value in their environment. So, if someone in a large organization opens a ticket to the infra teams to provision a software he/she wants to try, it may not be among the highest priority tickets that would get resolved.

At Forward Networks, we have learned to be very careful with any new tool, framework or dependency we add to our system. In fact our resource requirements are so low that our developers run the entire stack on their laptops which is very critical for fast debugging and quick iterations.

We have also spent a lot of engineering time and effort on making this possible. Here are some of the high level approaches:

When you need to scale to 1000x or 10000x, you can’t simply use a cluster with 1000 nodes. Even if it is possible, there is no economic justification to that. You have to do the hard engineering work to get the same done with minimal resources. Majority of our customers run our software on a single box. But we also provide the cluster version for those customers that want to ensure high availability or have more concurrent users and want to have higher search or compute throughput. 

One of our customers was telling us that they had to provision and operate a few racks of servers for another software (in the same space as us but not exactly our competitor) and how they were pleased and amazed on what our software delivers with such low requirements. Of course not only this can speed up adoption of the software, it saves customers money and allows you as a software vendor to have better margins.

Open source tools are not always the answer

In the early years of our startup, we were using off-the-shelf platforms and tools like Elasticsearch and Apache Spark for various usages. Over time it became clear that while these platforms are generic enough to be applicable to a wide range of applications, they weren’t a great fit when you need to have major customizations that are critical to your application.

For example, initially we were computing all end to end network behaviors and were indexing and storing them in Elasticsearch. But later it became clear that it is computationally infeasible to pre-compute all such behaviors to be able to store them in Elasticsearch and even if it was possible, such an index would be enormous in size. We had to switch to a lazy computation approach where we would pre-compute just enough data that would be needed to perform quick searches and at search time we would do the rest of the computation that was specific to user query. 

Initially we were trying to write plugins or customizations for Elasticsearch to adapt it to such a lazy computation approach but soon it became clear that it just won’t work and we had to create our own homegrown distributed compute and search platform.

Moving fast without breaking things needs sophisticated testing

Every month we release one major release of our software. Currently, each of these releases includes about 900 changes (git commits); and this is just going to increase as we hire more engineers. At this rate of change, we have to have a lot of testing in place to make sure we don’t have regressions in our releases. 

Every git commit is required to be verified by Jenkins jobs that run thousands of unit and integration tests to ensure there are no regressions. Any test failure would prevent the change from getting merged. In addition to these tests, we also use Error Prone to detect common bugs and Checkstyle to enforce a consistent coding style.

We also have many periodic tests that every few hours run more expensive tests against latest merged changes. These tests typically take a few hours to complete and hence it is not feasible to run them on individual changes. Instead when they detect issues, we use git bisect to identify the root causes. Not only these periodic tests check for correctness, they also ensure there are no performance regressions. These tests upload their performance results to SignalFx and we receive alerts on Slack and email if there are significant changes.

Are we done?

While we believe we have already built a product that is a significant step forward on how networks are managed and operated, our journey is 1% complete. Our vision is to become the essential platform for the whole network experience and we have just started in that direction. If this is something that interests you please join us. We are hiring for key positions across several departments. Note that having prior networking experience  is not a requirement for most of our software engineering positions.

If you operate a large-scale complex network, please request a demo to see how our software can de-risk your network operations and return massive business value.

Today’s network environments are far more complex than they were even a few years ago. Businesses are rapidly adopting new IT models to keep up with evolving customer needs: virtualization, cloud applications, IoT deployments, big data analytics, artificial intelligence, and more. Such digital transformation evolved the data operations of businesses into fairly intricate network environments – and they’re only continuing to grow more complicated.

The key issue is that these continued investments in network infrastructure are exceeding what most IT teams can support. Enterprise Strategy Group’s recent research uncovered that 66% of organizations view their IT environments as more or significantly more complex than they were two years ago. The findings further found that network complexity is only expected to increase, as 46% of organizations anticipate upgrading and expanding their network infrastructure.

Despite this exponentially increasing complexity, IT teams are expected to ensure modern enterprise networks deliver ubiquitous connectivity, cloud and mobile integration, internal and external collaboration and conferencing, and high levels of data integrity and security. This becomes a herculean task as network engineers need to carefully organize a network spanning thousands of devices – all with their own proprietary operating systems (OS) and different configuration rules – various geographic locations, and numerous corporate environments.

IT teams are being run ragged to keep up. This is driving demand for new network management solutions that empower network engineers to oversee these complexities in a way that doesn’t inhibit growth, increase risk, or rely on regressing back to on-premises, centralized systems.

The need for network transparency

The major problem historically is that it has been difficult for IT teams to know and visualize the network. Without a clear understanding of the entire network – the devices on it, their connections, the enacted policies, the geographical layout, etc. – it’s impossible to oversee a complicated network in a more simplified manner.

Without such transparency, organizations cannot verify that networks are operating as intended when reviewing network incidents or implementing new security policies. When considering a network update, determining how it may impact other applications negatively or introduce service-affecting issues becomes difficult with modern networks. An oft-cited study by Gartner notes that 80 percent of network outages are caused by people and process issues, with more than 50 percent of those outages caused by change configuration issues. It’s clear that this problem needs to be addressed, as it’s one of the major issues plaguing network engineers today.

Historically, to assess adherence to policies or the impact of any network change, businesses have relied on a few disparate solutions. Examples include outdated network topology diagrams, device inventories and management systems, CLI commands, and “ping” and “traceroute” utilities. But even using all of these tools in tandem doesn’t provide a reliable and holistic view of network behavior – much less an efficient one – for modern and heterogeneous enterprise environments. This has made it increasingly difficult for network managers to know what is going on within their internal data center networks.

Clearly, then, a better means of network analysis is vital. Today’s businesses need solutions that allow IT teams to map out their network infrastructure with the topology of the entire network quickly and efficiently, and to validate configuration and behavior accuracy on an end-to-end basis. The obtuse state of modern enterprise networks is the primary force driving a new boom for network verification and analytics software platforms.

The rise of intent-based models

Many of these new and innovative network management and operations tools are based around “intent-based” networking. This is the next stage of intelligent networking, which is quickly growing in popularity amongst the leading businesses according to Gartner. Intent-based networking leverages advanced automation tools to simplify operations, improve agility and fortify security.

Many of these new technologies serve as intelligent monitoring systems that offer deep visibility across a given network. They leverage programmable triggering mechanisms to capture and reporting key events and anomalies in real-time, while also filtering out irrelevant data. Not only does this serve as an advanced central platform IT teams can use to visualize and analyze the network, such solutions can proactively monitor the network without generating additional traffic or requiring the enterprise to create a distinct monitoring network.

The capability to create a snapshot of the entire network and the various devices spanning it is by itself invaluable. By creating such an accurate network copy, teams can easily take the most basic step in managing their networks: eliminating unneeded complexity. While preserving the enterprise network’s scope and reach, the IT team can examine different parts of the network to evaluate which devices or policies are adding complications with minimal value. Even the most intelligently designed networks include elements of redundancy. Through identifying and replacing outdated equipment, software, or policies, businesses can make small and straightforward adjustments to the overall infrastructure that will considerably simplify the network operations.

The immense value of these new and innovative network solutions appearing on the market goes beyond mere network visualization though. A number of them allow for advanced capabilities such as network verification, querying, and even automation. With such solutions, enterprises can quickly and easily test how their network would be affected when changes are made. If new firewalls were added to the network, the IT team could verify whether the implementation worked and how it might affect other traffic flows. They can isolate issues by searching for and analyzing all possible network paths that conform to a specific policy or intent.

The network analytics capabilities in this space are also constantly evolving. A select few of these tools even support query engines designed for networks, that allow enterprises to query their network as they would a database. By providing access to a normalized set of data across the network, IT can quickly check certain problems and answer specific questions about their network. This allows them to simplify and accelerate troubleshooting, and instead focus on making the network more resilient, agile, and robust.

Closing the IT gap

Large enterprise and service provider networks are becoming exponentially more complicated as network devices are added or upgraded to support increased scale and new applications. And this problem is only going to intensify as enterprise 5G applications begin to take off. The need to manage this growing complexity has led to new, smarter solutions that are revolutionizing network management.

Such networking solutions eliminate many of the manual network configurations and inputs that teams normally rely on, drastically freeing up time for IT to spend on other tasks. It is these new tools that are allowing enterprises to continue investing in their network infrastructure without getting overwhelmed by the growing complexity of it.

These intent-based solutions are helping to close the IT gap – empowering teams to do more in less time and prioritize improving services rather than troubleshooting problems. These flexible and scalable tools help engineers reduce risks and ensure outcomes for SRE teams, DevOps, and CI/CD functions, improving the overall business process. Such technologies are contributing to an evolution in the way IT functions in the enterprise.

“As a customer of Forward Networks I can tell you that I wouldn’t want to ever be without their software. Even the simplest task of finding a mac address or IP address on your network is done instantaneously on Forward Enterprise. The amount of time this has saved my team is worth every penny of the investment.”

—Network Architect & Engineer, Consumer Financial Services company

This was posted on LinkedIn a few weeks ago, in response to a post that didn’t even come from us. As a marketer, I was delighted…it’s not often that customers speak up independently and unprompted about how much they love a particular software package. 

Network engineers and network administrators are an especially tough audience. They’re smart, extremely tech-savvy, and they can sniff out marketing BS in a hot minute. They might not trust anything a vendor says (probably from being burned by too many audacious vendors with bold claims), but they might trust their peers. 

Let’s face it: all software providers are trying to convince you that their offering is game-changing. Only in a few cases is that actually true. I get unsolicited email every day from companies trying to sell me the latest and greatest in lead attribution, media monitoring, account-based marketing, event management, and everything else that matters to a marketing team. Everyone wants to solve a problem they assume I have. (On a related note, my personal inbox is stuffed with messages pitching whiter teeth, younger skin, and more comfortable underwear. They’re probably more accurate on the problems. 😉 One learns to read with a suspicious eye. 

So why not let customers do all of the talking? After all, who better to evangelize your product than the USERS? At Forward, we’re lucky to have very large, well-known businesses on our client list, including some of the world’s largest banks, payment processors, entertainment providers, media conglomerates, and government entities. For the most part, they’re in highly regulated industries and need to stay confidential, but here are few snippets they’ve provided to software review portals like G2 and Gartner Peer Insights (or tweeted): 

“Forward Networks has really impressed us. It’s a very ambitious solution they’re providing and the quality of both their engineering and support have proven to be really top notch. I’m a tools engineer myself, so I understand how hard it is to do what they’ve done with this product. At our scale many tools just get crushed. Theirs held up really well under a vigorous POC and just keeps getting better.”

—Software Engineer (Network Tools) in the Communications Industry

“Seeing is believing. Forward Networks’ visibility seems pretty close to a holy grail.”

—Network Architect 

“Forward Networks has rapidly developed to make the network visible, easy to understand, improve automation, and reduce outages to the network, applications and business process. This tool has an impressive wow factor to all engineers who review it. It can help reduce risks and ensure outcomes for SRE teams, DevOps and CI/CD functions.”

—Head of Engineering Quality, $30B Communications Business

Forward Networks has carefully crafted a robust product and provides all the support needed to help its customers achieve success. Passionate engagements from team leaders and strong desire to gather feedback and improve the product makes FN standout as a reliable and continually evolving solution.”

Team Lead, Enterprise Architecture and Technology Innovation

Honestly, I can’t do much better than that when it comes to convincing network operators at large companies that it’s worth piloting the software. Want instant visibility into your network? Check. Want to easily spot landmines in the network before they cause an outage? Check. Want world-class service and support? You bet. 

If your job involves managing a complex network, just humor me and go request a demo. If I turn out to be wrong, email me at lisagarvey@forwardnetworks.com. If I’m right, please join the ranks of happy customers and write a review. 

Last week, Forward Networks presented at Networking Field Day 21. For two hours on a beautiful Thursday afternoon in Palo Alto, we hosted an esteemed group of network industry influencers, bloggers, and all-around experts.

We had the hospitality program on full-tilt. Enormous warm cookies, a “swag buffet,” and a hip tea bar serving custom boba tea. (For the uninitiated, it’s iced milk tea with tapioca pearls, and it’s extremely popular in the Bay Area. Side note: in 1998, after lining up for an hour with my Netscape teammates at the only boba tea place within 10 miles of Mountain View, I mentioned to my husband that we should invest in one because it would eventually go mainstream. He laughed at me, but he’s the same guy who said that no one would regularly spend money on take-out coffee, and that bottled water was a fad. Fortunately, he has other good qualities.)

The NFD delegates seemed delighted with the tea. Who wouldn’t be, after days of being shuttled around Silicon Valley to hear hours-long product pitches and technical deep dives on complex technology? Yes, that’s their job for the week, but I still feel for them.

The team at Forward was really enthusiastic about showing off the latest advancements of our software platform, especially because for many of the NFD delegates, it was their first introduction to Forward Enterprise. There’s a certain satisfaction that comes from demonstrating the benefits of a mathematical network model to people who get it. I love seeing that “aha!” moment, and we had plenty of those in the session.

The questions came quickly:

“Can your software correlate the network overlay and underlay and provide visibility into both?” (Yes.)

“Does this feature only diff configs, or does it diff the output of the show commands?”

(It shows network diffs on everything — IP route tables, ACLs, VLANs, etc., between your chosen points in time, in a canonical vendor-agnostic way.)

“In a network of, say, 2000 devices, how quickly can someone set up your software and go from day-zero to full search capabilities?” (A few hours.) 

To watch the whole presentation, or selected parts of it, spend a little time on the Tech Field Day Forward Networks page. If you really want to see how the company has made progress against its founding vision, watch the first videos from NFD 13 in 2016, when the founding team at Forward revealed their first version of the product, and then watch the videos from NFD 21 in 2019. It’s an impressive leap, which has only been made possible by having truly brilliant computer scientists and seasoned product teams respond to requests from very large customers who feel the frequent pain of managing complex, multi-vendor networks.

For a detailed breakdown:

So, what does boba tea have in common with network verification? 

Ready to learn more and try Forward Enterprise in your network environment? Schedule a demo at your convenience and we’ll be happy to walk you through it. 

Top cross