Tool that helps people capture, organise and retrieve audio content seamlessly.

Company
Synth
My Role
Lead Product Designer
Timeline
2021 - 2022
Responsibilities
Rapid Prototyping
User Research
Competitor Analysis
Visual Design
UI/UX Design
Overview

In September of 2021, Synth was an MVP with a big idea — Second brain for anything audio. It was a tool that would help capture meetings, YouTube videos, podcasts or any other audio content users could consume at a tap. Moreover, it would store the information in a way that would be extremely easy to retrieve whenever required.

When I joined the team, Synth showed potential but lacked the polish and depth to become indispensable. My challenge was to reimagine Synth as an MLP (Minimum Lovable Product), seamlessly integrated into users' workflows while solving deep-seated pain points around audio information. What followed was an exciting journey of discovery, collaboration, and design innovation.

The project was done in close collaboration with co-founders at Synth — Urvin Soneta, Vaibhav Saxena and Suneel Matham. I worked with multiple engineers during the course of the project.

The project lasted around 6 months, 2 months of discovery and then  4 months of building, testing and co-creating with the users.

Though the product is positioned differently now, this case study is a story of my journey in taking the product from 1 to 10, and learnings and experimentation along the way.

The Problem1

We live in a world of audio overload. The average professional consumes over 1,000 hours of meetings, podcasts, and videos annually, yet how much of it do we remember? When we need to, how much of it are we able to refer back to? How much of it do we accurately capture?

Audio content is dense with critical information and hidden connections that could yield valuable insights. However, without the right tools to extract & connect ideas, these opportunities are often lost.

Capturing audio information as notes takes focus away

Manual note-taking often requires intense focus and can’t capture everything, leading to gaps that could lose valuable context or details.

Multiple steps to organise captured information

Audio is a linear format, making it difficult to organize into topics or categories. Unlike written notes or knowledge management tools where content can be organized into hierarchies, tags, or networks, audio information is hard to “see” as an overall structure, which makes synthesizing and contextualizing it more challenging.

Difficult to extract required information as notes pile up

Even when we do manage to capture details through manual notes or transcription, finding that information when needed is a huge challenge. Audio lacks inherent searchability, and common note-taking approaches often don’t include indexing or tagging methods that allow for quick cross-referencing. This leads to wasted time as we sift through notes, re-listen to audio, or rely on faulty memory, which impacts decision-making and productivity.

The MVP2 (Before)

The Synth team’s simple and core belief ‘It should be easy to take notes during and retrieve information after the meetings, videos or podcasts’ lead them to come up with an MVP that enabled people to capture notes in real time without switching screens, timestamp key-points being talked about, generate summaries and retrieve specific parts of transcripts by asking questions.

Discovery and Key Findings3

As a result of interviewing and testing novel concepts with around 20+ people during the course of the project and running usability tests with 14+ people we were able to unlock the following insights that not merely addressed the pain-points of the users but rather opened up opportunities that took the product in the direction that the team had previously not imagined. This prompted the team to redesign Synth’s core user experience from the ground up. Here are some of the top insights that helped shape the product.

Users engage with audio in two distinct phases

Users focus on understanding and participating in real-time, but they also want to capture important points effortlessly for later reference. After listening, users shift their focus from active participation to reflection, and application of the information they consumed. A seamless transition between these phases is essential to ensure the information is not only captured but also retained, organized, and actionable.

Current workflow requires a lot of active attention to be able to taking annotations in real time.
After a session, I need to review key points and figure out action items. I wish there was an easy way to pull out just the important parts.
During the meeting, I’m trying to keep up with the discussion. I can’t take detailed notes and stay focused at the same time.
Contextual nuances go missing in transcripts

In spoken content, tone, emphasis, and speaker interactions add layers of meaning. Transcription doesn’t preserve these nuances, so essential context or intent gets lost. This alters the intended meaning, especially in sensitive or high-stakes discussions, potentially leading to misinterpretations down the line.

I can read the transcript, but it doesn’t tell me how something was said. Was it a suggestion, or a decision? I feel like I’m missing the tone.
Sometimes, I need to hear the actual voice to understand why a point was important. The transcript alone feels too flat.
I need to hear their voice again to understand the intent behind their words.
Sheer volume of audio content leads to information overload

Without a clear structure, important points become lost amid the flood of data, leading to inefficiency, stress, and an inability to find valuable knowledge when it’s needed most.

I want a way to systematically look at information. Otherwise, it’s not solving any problem.
I have so many recordings from meetings, and I barely have time to sift through them. It feels like digging through a pile of hay for a needle.
By the time I find the specific part I’m looking for, I’ve wasted so much time that I’ve forgotten why I needed it.
Missed opportunities for serendipity

Since different content types (meetings, podcasts, etc.) are often accessed across various platforms, the information becomes scattered across multiple locations. This fragmentation makes it difficult to connect insights across sources, such as linking a concept from a podcast with a discussion from a recent meeting. This knowledge often remains isolated, reducing its value.

I need a one big sync thing that links all my learnings together - so that I can follow the trail and get to the point I’m trying to get.
I listened to a podcast last week that mentioned a concept, but now I can’t remember enough details to connect it to my project.
It would be amazing if I could link similar ideas across all the audio I’ve captured, instead of treating every recording like an isolated island.
Competitor Analysis

In a crowded landscape of productivity and knowledge tools, Synth aimed to address a critical gap that existing solutions missed.

Tools like Notion, Evernote, and Roam Research excelled at organizing written content but struggled with audio, where capturing and extracting insights was inherently more complex.

On the other hand, tools like Otter.ai and Fireflies.ai facilitated audio capture but primarily catered to meeting-specific workflows and fell short in retrieval and cross-context usability.

Synth was designed to bridge this divide—offering both seamless audio capture and powerful, intuitive retrieval features. This dual focus enabled it to address broader use cases, from podcasts to YouTube videos, while positioning itself uniquely in a market heavily segmented by function. Synth’s ability to unlock actionable insights from audio content provided an edge over competitors, ensuring users could connect knowledge across contexts effortlessly.

Guiding Principles4

To translate these insights into actionable design decisions, we established a set of guiding principles. These principles grounded our work and helped us align every feature and interaction with the user’s goals.

Empower without overwhelming

Design features that amplify user productivity without adding complexity. Every interaction should feel like it’s saving time or effort, not creating a new layer of friction.

Seamless integration

Fit naturally into the tools and workflows users already use. Synth should feel like an extension of their existing processes, not a separate task to manage.

Visualize the invisible

Audio is inherently linear and intangible. Translate it into clear, digestible visuals like timelines, chapters, and notes, making dense audio content easier to navigate and comprehend.

The MLP5 (After)

Crafting the MLP required stripping Synth back to its essence and rebuilding it around the user's mental models. This meant addressing key challenges like simplifying note-taking in real time, organizing linear audio into actionable insights, and ensuring that retrieval was as intuitive as searching for a bookmarked page in a favorite book.

To achieve this, we needed to rethink Synth’s role in the user’s workflow. Could it integrate seamlessly without disrupting their focus? Could it provide value not just in the moment but long after the session ended? These questions guided our approach as we refined the product into something that didn’t just work but worked beautifully.

The entire Synth experience was divided into two phases ‘In the Moment’ and ‘Post Listening’.

Phase 1: In the Moment

‘In the Moment’ is associated ‘note taking’ which involves capturing information quickly without losing trail of thought. It offers users shortcuts to capture audio and what’s visible on the screen.

Access Synth from the menu bar anytime without disrupting your flow
Move the widget anywhere on the screen, wherever is least distracting
Switch between the widget view (to be able to focus solely on the audio) or the transcript view (to take notes, highlight parts of transcript) anytime
Capture the last 10 seconds of audio and get it timestamped under notes automatically
Take a screenshot anytime and have it appear in the notes section automatically
Phase 2: Post Listening

In ‘Post Listening’ phase, people shift their focus from active participation to reflection, retrieval, and application of the information they consumed.

A side-by-side view of the transcript and notes to reference the full context of the conversation while capturing or refining notes
Rich post-session tools and templates for organizing and editing notes
Keywords to identify and highlight important sections of a transcript or recording, allowing users to quickly locate significant points
Advanced search with features like keyword search, topic clustering, or audio snippets for easy retrieval. Ability to search from notes vs. transcripts, based on time and context
Revisit specific audio snippets without requiring to go through the entire recording, while retaining context so that users can listen to what was said, why, and by whom
A chapter-wise breakdown of transcripts organizes long audio content into topic-specific segments
A timeline-based navigation providing an intuitive way to to locate information quickly

The Outcome5

Crafted an effortless, distraction-free experience that seamlessly integrated into users’ workflows.

Increased daily active users by 200% within six months and grew the user base to 20k+ within the first year.

NEXT PROJECT

A smart home to help people live a more comfortable, secure & nurtured life.

Go to Next Project