The Year On that led to founding Chorus.ai

Jun 07, 2021

I haven’t come across too many examples of the earliest months of starting a company so thought I would share my experience. In 2014 I decided to start a company and had the luxury to take a "Year On" to explore new domains and technologies. The idea had to meet 3 criteria:

Excited to dedicate the next 10 years of my career to
Never attempted idea vs. executing a proven business model
Unlikely to succeed, but if successful would create a multi-billion dollar software category

I asked myself

What big problem can we solve with today's technology that we couldn't 5 years ago?

That idea became Chorus.ai, and I'm humbled by how far Chorus has come in the last 6 years. As Chorus' founding CEO, I raised over $65M in venture capital from Emergence Capital (Salesforce, Zoom), Redpoint Ventures (Stripe, Twilio) and Georgian Partners (Shopify) and developed a product with an NPS of ~70. In 2021, G2 recognized Chorus as the #3 highest rated Software product in the world. We have have over 180 incredible employees and serve iconic companies like Uber, Zoom, MongoDB, Redis Labs, Segment, and hundreds of others.

The early days of founding a company are hard. It can be hard to tell the difference between spinning your wheels and making progress. The saying that 9 women can't make a baby in 1 month resonated deeply during that year. If you're going through the early days, be patient with yourself, follow your curiosity, and enjoy the process.

It can be hard to pick a place to start when you're open to anything. I started by imposing a constraint. I had conviction that applying machine learning to new data sets would allow us to solve previously unsolvable problems. In the same way cloud first companies displaced on-prem, and mobile first displaced desktop, I believed "ML first" companies that could capture, analyze and act on new streams of data in real-time would displace static, forms based, databases.

To call myself out, and going against many startup playbooks: I started with a new technology in search of a problem I was excited to solve. I would add that if you take this approach, quickly focus on a business function or specific user, understand their Top 3 issues and whether your technology can solve one of them.

Being in Action and getting your hands dirty is the fastest way to develop intuition and understanding of a customer pain point or new technology. In 2014 I knew nothing about machine learning and hadn't programmed in over 10 years. Coursera did a short write-up about my experience learning machine learning with the help of my close friend and future co-founder, Russell.

After completing the course I wanted to go beyond the basics and extend it in a direction I'm passionate about. I love sound, I love music and I love languages, so I focused on Voice.

How is voice represented as data and how would you feed it into a neural network?

What could an organization learn from and predict from their voice data?

Simple tools like Matlab and Octave let me build and play with single frequency audio signals using the math.sin() method and deconstruct mixed frequency audio using Fourier Transforms. Lessons from my undergrad Signal Processing class started making sense, like why there is a trade-off between sampling rate and frequency resolution when transforming from the time to frequency domain.

Once I understood the fundamentals, I reviewed published research on voice based emotion detection and reached out to those experts, acquired the data sets they used, and reproduced their research. Examining the underlying data, I realized the underlying authenticity of the voice recordings the models were trained on was a problem for real-world applications: They were sourced from undergrad acting students reading scripts, and predicting the emotion of a voice snippet didn't solve a practical problem for businesses.

Emotion detection research was cool, but there is a saying that "Companies don't have budget for cool". We would need to capture authentic voice data and apply our analysis to solving a Top 3 business problem.

I could see an untapped multi-billion dollar opportunity in voice and started down the path of creating a consumer application for voice based restaurant reviews called Bon Appetit. My thesis was that in a voice-first world, hearing the right voice, telling you about a restaurant or bar they recommend would help you make a better, faster decision than reading reviews. Over time, we could analyze the voices and swipes to determine what resonated with users on a personalized level. I built a simple iOS application and hit the pavement in poor German getting friends and strangers to record recommendations. I then tested the application and recommendations with the local blind community in Switzerland. Who better to help me understand the power of voice than those highly attuned to it?

I was having fun and learning a lot, but realized that the cost and complexity of building a user-base and voice based reviews meant the big players would be likely to capture that market. The answer was B2B.

It became obvious that voice communication was the largest, most valuable, and untapped enterprise data set, and that capturing and analyzing sales conversations were the place to start.

It took a full year of exploration before two close friends and mentors that I had been speaking with every few weeks helped me realize that I needed to pull the trigger on what became Chorus: Capturing and analyzing enterprise voice data using machine learning and Natural Language Processing.

The next step was recruiting technical co-founders. The ability to recruit a world-class founding team is an acid test of both the opportunity and your ability to articulate it. If you can't get someone world-class excited and to join you full time, there's likely more work to do. I recruited a world-class engineer, and an expert in NLP to join the founding team and we got to work.

We asked ourselves why was Chorus possible now and not earlier? Why did we believe the market would be massive in 5 years?

There were three reasons:

Technological progress: Breakthroughs in deep learning improved transcription accuracy to the point it could deliver the accuracy needed to solve the problems our customers had. And it would continue to improve as we trained our models on more data and algorithms improved further.
Shift to Online Meetings: Phone calls, WebEx, GoToMeeting and Zoom would be used for more business conversations. These could be captured directly through the systems without asking users to do anything.
Shift to Inside Sales: More businesses were building inside sales teams (those sitting in offices versus visiting customers onsite, over golf, dinners and drinks).

We asked ourselves if we could beat the incumbents providing communications services, despite starting with no data:

Machine learning models require the call recordings, metadata and outcomes. Recordings on their own were useless. Incumbents could only access recordings and most companies did not record calls. Some of the metadata would need to be generated from the call recording itself (who spoke, for how long, voice speech rate, pauses, interruptions, transcripts, topics), which requires specific capabilities these companies did not have.
Other metadata and outcome data would come from business systems that these incumbents had never tapped into: Who participated in the meeting? What happened after the conversation? Did we close a deal? All of this information was stored in other business systems (Calendar, Email, CRM). Furthermore, you needed a snapshot of those systems at the time each conversation took place to understand the state and outcome.
So even if an incumbent had millions of hours of call recordings, they wouldn't be able to put it into the context that mattered to be valuable without connecting to all of these other systems and creating a real-time pipeline to analyze the insights.
We were all starting from the same starting point.

With this clarity, we developed a killer technology demo (in 2015!) that analyzed a live phone call in real-time combining live transcription, natural language understanding, and voice metadata that appeared on the screen as we spoke. It was a magical experience for anyone watching it and enabled us to raise our ~$5M Seed round to hire a small team, develop the product, and acquire customers.

We coined the term Conversation Intelligence for the new category we were creating, and an incredible journey started. My blog post on Chorus' launch goes into our vision in more detail.

Roy’s Newsletter

Discussion about this post

Roy’s Newsletter

The Year On that led to founding Chorus.ai

What big problem can we solve with today's technology that we couldn't 5 years ago?

How is voice represented as data and how would you feed it into a neural network? What could an organization learn from and predict from their voice data?

Discussion about this post

How is voice represented as data and how would you feed it into a neural network?

What could an organization learn from and predict from their voice data?