Forward Networks was just named a Great Place to Work in the Bay Area by Fortune. We’re excited to be on this list in an area known for setting the bar high when it comes to treating employees well. Perks are nice, but perks alone don’t get 100% of employees to say they work at a great company — that comes down to people and culture. And 100% of my colleagues said that Forward Networks is a Great Place to Work, beating the national average by 43%.

During my first conversation with a recruiter, I knew in my heart that there was something special about Forward Networks. Throughout the rest of the interview process – my hunch was confirmed, everyone I spoke to was smart, dedicated, and authentic. I knew it was the kind of environment where I could be myself, keep learning, face challenges, and thrive.

I love our platform, and I’m so happy to be working with a technology I believe in – but for me, the best part of Forward Networks is the people I get to work with. Everyone is committed to doing what’s best for our customers.

About two months into my tenure, we had our first hackathon. I’ve always experienced hackathons as a (well-earned) week for engineers to go play with their toys without regard to practical application. That’s not how my amazing co-workers approached it.

Every team focused on resolving a challenge our customers had experienced. Founders worked on teams with junior engineers – side by side as equals. Best of all – we incorporated several of the teams' suggestions into shipped product within a quarter. This is because everyone checked their ego and truly collaborated to support our customers. It’s also because we have some of the smartest people I’ve ever worked with on staff.

I’ve always appreciated working with smart people; I mean, who doesn’t? But sometimes it can be intimidating to ask questions. That’s never been the case here. People are not only willing to share knowledge, but to ask for your expertise as well. We all sit down to lunch together once a week – from our founders and executive leaders to the newest kid on the block.

I asked some of my co-workers why they think Forward Networks is a great place to work, and while the answers were unique, the themes of trust, empowerment, and people rose to the top. Here are some of the responses:

“I enjoy the level of autonomy I have in doing my job. I also appreciate that I have the full support of management anytime I need it.”

– Jack Shen, Customer Success

“Forward Networks fully supports its people. When I decided to pursue a master's degree that required me to move overseas, I had the full support of my team and leaders to pursue my dream and continue to contribute to the company.”

– Aditya Chakraborty, Design

“The people at Forward Networks exhibit a high level of integrity. They are dedicated, smart, hard-working, accountable, and responsive. I can trust them implicitly. At the same time, our product is amazing. There’s no other networking software offering like this in the industry, and we are just getting started.”

– Andi Voellmy, Principal Engineering Manager

“The Forward Networks platform is software I wish I’d had access to earlier in my career. We’re breaking new ground around network digital twins, and I get to be on the forefront of this development all the while working with great people that I trust and appreciate.”

– Tyson Henrie, Customer Success

“Forward Networks understands how to hire and empower people that exhibit a desire for continuous learning and growth. We have lots of people in the company that went to top schools, and we have people with non-traditional backgrounds. We all work together with mutual respect and trust which makes this a great place to work."

– Scott Arky, IT Manager

I feel so fortunate to be at this company with these people. Sure, we have a game room, happy hours, lunches, and great benefits, but none of that matters if the company culture is unappealing. We have a secret sauce that can’t easily be replicated; we have talent, trust, empowerment, and support, combined with the excitement of solving “unsolvable” problems. And … we’re hiring. Check out the careers page if you want to be as happy at work as we are!

Forward Networks is a winner in the 2022 Business Insights Excellence in Customer Service Awards. We are over the moon — because our customers are our top priority.

Our platform addresses unprecedented network complexity by collecting configuration and state data for all devices in the network and indexing it in a vendor-agnostic way. The platform collects data across physical, cloud, and virtual environments to help network, security, and cloud engineers make the network more predictable, agile, and secure by providing immediate access to actionable information.

Based on an advanced mathematical model, Forward Enterprise creates a network digital twin of an enterprise environment across on-premises devices and hybrid multi-cloud environments. IT teams can instantly troubleshoot, verify intent, prove compliance, and predict network behavior by computing all possible traffic paths.

Put more simply, we make hard stuff easy for IT engineers, so they can spend time on proactive projects. And we enjoy doing it! In a recent employee survey, one of our people put it this way: “The result of our work first and foremost helps customers in ways nobody could have imagined before; this motivates every person in our company.”

“We work with organizations whose networks cannot go down without significant consequences to the economy, public safety, and financial systems,” said Yadhu Govindarajan, Director, Customer Success, Forward Networks. “Each of these organizations are running highly complex networks and facing unique challenges that we help them address to ensure their networks are predictable, reliable, and secure. Winning this award is especially important to us, as it validates the impact of our commitment and effort.”

The Excellence in Customer Service Awards celebrate those who are winning by supporting their own customers and those who are developing the tools to help others find success. Awards were given out to consultants, outsource partners, and technology providers for superior performances in the past 12 months.

To learn more about the unique things we’ve done to support our customers, check out our customer value blog series by Yadhu.

What do we do?

At Forward Networks we build digital twins for computer networks. Enterprises with large networks use our software to build a software copy of their network and use that for searching, verifying, and predicting behavior of their network. It is not a simulation. It is a mathematically accurate model of their network.

Why is it a hard problem?

Large enterprise networks contain thousands of devices (switches, routers, firewalls, load balancers, etc). Each of these devices can have complex behaviors. Now imagine a large graph with thousands of nodes where each node represents one of these devices and the links between nodes show how they are connected. You need to model exactly how traffic originating from edge devices is propagated through the network.

To do so, you need to understand the exact behavior of each device in handling different packets. A typical enterprise network not only includes different types of devices (routers, firewalls, etc), but they are built by different vendors (Cisco, Arista, Juniper, etc) and even for the same device type from the same vendor, you typically see many different firmware versions. To build a mathematically accurate model you need to model every corner case and a lot of these are not even documented by vendors.

At Forward we have built an automated testing infrastructure for inferring forwarding behavior of devices. We purchase or lease these devices; put them in our lab; inject various types of traffic to them and observe how these devices behave.

Where are we today?

I’m proud to show off publicly today, for the first time, that we can process networks with more than 45,000 devices on a single box (e.g. a single ec2 instance). Here is a screenshot of an example network with about 45k devices:

Some of our customers send us their obfuscated data to help us identify performance bottlenecks and further improve performance. It is a win-win scenario. Our software gets better over time and they get even faster processing time. The data is fully obfuscated in that every IP and MAC address is randomly changed to a different address and every name is also converted to a random name and these mappings are irreversible. These changes do not materially change the overall behavior of the model and the obfuscated data is still representative of the complexity and diversity of network behaviors of the original network. The network in the above example is built from those data.

This network includes more than 10^30 flows. Each flow shows how a group of similar packets traverses the network. For example one flow might show how email traffic originating from a specific host and destined to another host starts from a datacenter then goes through several backbone devices and finally arrives at the destination data center. 

Each of these flows can be complex. If we were to spend 1 microsecond to compute each of these flows, it would still take us more than 10^17 years to compute this. But with a lot of hard engineering work, algorithmic optimizations and performance optimizations we are able to process this network in under an hour and we are capable of processing this on a single box. You don’t need a massive cluster for such computation. The best part is that the majority of the computation scales linearly. So, if customers want faster processing speed or higher search and verification throughput they can use our cluster version and scale based on their requirements.

How long did it take us to get here?

Forward Networks was founded in July 2013. Our founders are Stanford PhD grads and as a result the very first test data that we got was a 16 device collection from part of the Stanford network. I joined Forward in Sep 2014 after spending a couple of years building and working with large distributed systems in Facebook and Microsoft. I started leading the effort to scale our computation to be able to finish the computation of that 16 device network in a reasonable amount of time and it took us about two years to get there (Mar 2015).

Then almost every year we were able to process a 10x larger network. Today, we have tested our software on a very complex network with 45k devices. We are currently working on further optimization and scaling efforts and our projection is to get to 100k devices in Dec 2020. The following graph shows our progress our last couple of months and the projection till Dec 2020 on logarithmic scale:

Lessons learned

It takes time to build complex enterprise software

As I mentioned above, we started with data of a very small network. As we made our software better, faster and more scalable, we were able to go to customers with larger networks to get the next larger dataset; find the next set of bottlenecks and work on those. We had to rewrite or significantly change the computation core of our software multiple times because as we got access to larger data we would see patterns that we hadn’t anticipated before.

Could we have reduced the time it took us to get here if we had access to large data on early days of our start up? Yes. Was it feasible? No. Why would a large enterprise spend the time to install our software, configure their security policies to allow our collector to connect to thousands of devices in their production network to pull their configs and send the data to a tiny startup that doesn’t have a proven product yet? It is only going to happen if it is a win-win situation. Every time we got access to the next larger dataset from a customer, we optimized our software based on that and went to other customers that had networks of that size where our software was already capable of processing all or majority of their network and when they shared this data with us. We would either find new data patterns that needed to be optimized or combine all the data we had received from customers to build larger datasets for scale testing and improvements. It is a cycle and it takes time and patience to build complex enterprise software.

Customers with large networks typically have much more strict security policies which means that they wouldn’t share their data with us. This is why we had to spend the time and build data obfuscation capabilities in our software to allow them to obfuscate their data and share the result with us which would reveal the performance bottlenecks without sharing their actual data. Some customers have such strict policies that even that is not possible and for those we have built tools that aggregate overall statistics which are typically useful for narrowing down the root cause of performance bottlenecks.

When selling enterprise software, customers typically don’t spend a large amount of money on a software platform if they’re not 100% sure that it would work for them. There is typically a trial or proof-of-concept period where they install the software and evaluate it in their environment. In our early years, we worked very hard with our first few trial customers to make the software work well for them. There were cases which didn’t end up in immediate purchase but their data gave us invaluable insight in improving our software.

On-prem software should work on minimal hardware

These days it is pretty easy to provision an instance in AWS, Azure or other cloud providers with 1TB or more RAM. But you would be amazed to know how many times we have had to wait for weeks or months for some customers to provision a single on-prem instance with 128GB or 256GB RAM. Large enterprises typically allow provisioning small instances pretty quickly. But as soon as your software needs a more powerful instance, there can be a lot of bureaucracy to get it done. And remember, during the initial interactions with customers, you want them to start using your software quickly to finish the proof-of-concept period. During this time, they are still evaluating your software and they haven’t yet seen the value in their environment. So, if someone in a large organization opens a ticket to the infra teams to provision a software he/she wants to try, it may not be among the highest priority tickets that would get resolved.

At Forward Networks, we have learned to be very careful with any new tool, framework or dependency we add to our system. In fact our resource requirements are so low that our developers run the entire stack on their laptops which is very critical for fast debugging and quick iterations.

We have also spent a lot of engineering time and effort on making this possible. Here are some of the high level approaches:

When you need to scale to 1000x or 10000x, you can’t simply use a cluster with 1000 nodes. Even if it is possible, there is no economic justification to that. You have to do the hard engineering work to get the same done with minimal resources. Majority of our customers run our software on a single box. But we also provide the cluster version for those customers that want to ensure high availability or have more concurrent users and want to have higher search or compute throughput. 

One of our customers was telling us that they had to provision and operate a few racks of servers for another software (in the same space as us but not exactly our competitor) and how they were pleased and amazed on what our software delivers with such low requirements. Of course not only this can speed up adoption of the software, it saves customers money and allows you as a software vendor to have better margins.

Open source tools are not always the answer

In the early years of our startup, we were using off-the-shelf platforms and tools like Elasticsearch and Apache Spark for various usages. Over time it became clear that while these platforms are generic enough to be applicable to a wide range of applications, they weren’t a great fit when you need to have major customizations that are critical to your application.

For example, initially we were computing all end to end network behaviors and were indexing and storing them in Elasticsearch. But later it became clear that it is computationally infeasible to pre-compute all such behaviors to be able to store them in Elasticsearch and even if it was possible, such an index would be enormous in size. We had to switch to a lazy computation approach where we would pre-compute just enough data that would be needed to perform quick searches and at search time we would do the rest of the computation that was specific to user query. 

Initially we were trying to write plugins or customizations for Elasticsearch to adapt it to such a lazy computation approach but soon it became clear that it just won’t work and we had to create our own homegrown distributed compute and search platform.

Moving fast without breaking things needs sophisticated testing

Every month we release one major release of our software. Currently, each of these releases includes about 900 changes (git commits); and this is just going to increase as we hire more engineers. At this rate of change, we have to have a lot of testing in place to make sure we don’t have regressions in our releases. 

Every git commit is required to be verified by Jenkins jobs that run thousands of unit and integration tests to ensure there are no regressions. Any test failure would prevent the change from getting merged. In addition to these tests, we also use Error Prone to detect common bugs and Checkstyle to enforce a consistent coding style.

We also have many periodic tests that every few hours run more expensive tests against latest merged changes. These tests typically take a few hours to complete and hence it is not feasible to run them on individual changes. Instead when they detect issues, we use git bisect to identify the root causes. Not only these periodic tests check for correctness, they also ensure there are no performance regressions. These tests upload their performance results to SignalFx and we receive alerts on Slack and email if there are significant changes.

Are we done?

While we believe we have already built a product that is a significant step forward on how networks are managed and operated, our journey is 1% complete. Our vision is to become the essential platform for the whole network experience and we have just started in that direction. If this is something that interests you please join us. We are hiring for key positions across several departments. Note that having prior networking experience  is not a requirement for most of our software engineering positions.

If you operate a large-scale complex network, please request a demo to see how our software can de-risk your network operations and return massive business value.

Top cross