When Good Tech Lands in the Wrong Place
What building Siri taught me about where voice fails—and why.
"Why isn't Siri good yet?"
I spent years building voice AI at Apple, so I get this question all the time.
Recent speech-to-speech breakthroughs are delivering much better technology: sub-200ms latency, emotional intelligence, and more natural conversations that actually feel human.
But they’re launching into a market where Alexa, Google Assistant, and yes, Siri, trained millions to expect…
…"I didn’t quite get that."
It's like serving gourmet food in a gas station. Even if the product is exceptional, the context kills it.
Voice AI is finally good, but we spent a decade teaching users not to trust it. We’re deploying better tech into a behaviorally damaged market.
And the race is whether we can rebuild trust faster than we’re burning it.
What Broke It
Five years ago, voice AI failed for very real reasons: transcription was brittle, language models were shallow, and latency made even basic interactions feel laggy. The promise was natural conversation. The reality was clunky, inconsistent, and constrained.
At Apple, I saw this in the usage data. People used Siri for basic commands: "Call Mom," "Play Spotify," "What time is it?" Rarely did they attempt anything more complex, like "Add milk to my shopping list and remind me to go to the store at 5pm," even though that’s exactly what voice should be good at.
After enough “Sorry, I didn’t get that” moments, users stopped testing the boundaries. They developed workarounds. Set the timer by voice, then open Maps manually. Speak slowly. Use known-safe phrases. Always have a backup plan.
That caution calcified into habit. Even as the tech improved, the behavior stayed stuck.
Why Tech Isn’t the Bottleneck
The irony is: the technology is finally ready.
Latency is low. Voice models now carry tone, empathy, even hesitation. Tools like Vapi and LiveKit handle turn-taking, interruptions, and emotional flow. Orchestration has caught up.
But none of that fixes the core issue: we’re deploying this new tech into trust-damaged environments. Every bad experience—every bot that loops, every system that mangles your zip code—doesn’t just frustrate. It reinforces the belief that voice doesn’t work.
And the way we’re deploying it now? It’s making that belief stronger.
We build beautifully orchestrated systems with natural pacing and multi-step reasoning… and send them straight into customer support queues.
It’s understandable—contact centers are massive markets. But they’re also ground zero for past failure. The place where users learned to speak slowly, expect errors, and always have a backup plan.
If your first deployment is in the exact place people stopped trusting voice, you're not accelerating adoption. You're deepening the scar tissue.
Where Voice Actually Works
Over the past year, I’ve met with dozens of voice AI founders. The ones getting real traction all seem to have landed on the same insight: most users still don’t trust voice to think for them. So the smartest companies don’t try to change that. They just stay out of its way.
We see it in the use cases that are actually working: a delivery driver speaks a tracking number instead of typing. A doctor dictates notes between patient visits. An insurance agent fills out a form by talking instead of clicking through menus.
None of these are conversations. There’s no ambiguity, no room for interpretation, no expectation that the system understands anything beyond the words themselves. The human knows what they want to say. The system just listens and records. That’s it.
It’s clean. It’s fast. It’s boring. And it’s working.
Which is exactly the point. This recent wave of success in voice isn’t about cracking natural language understanding. It’s about avoiding it. Most of the momentum in the space right now lives in use cases where trust was never broken—because it was never required.
That’s what makes this moment tricky to read. From the outside, it looks like voice is finally taking off. But when you zoom in, it’s clear that the boom is happening in a narrow slice of the market. Voice is succeeding as infrastructure. As UI. Not as an assistant, not as a companion, and definitely not as a conversation partner.
The most exciting applications—support agents that feel human, assistants that can follow context, interfaces that actually talk with you—those still sit under a heavy layer of skepticism. Not because the tech isn’t there. But because users already learned it wouldn’t work. And most haven’t unlearned that yet.
We’re not rebuilding trust. We’re just staying away from the places it broke.
The Real Game
The hard part right now isn’t building voice systems that work. It’s knowing where they’ll be allowed to.
If you deploy into a context where people already expect failure—support calls, open-ended help desks, anything with history—you’re starting in a hole. One mistake and the user bails. You don’t get another shot.
But in the right environments—places with clear structure, low emotion, and no scar tissue—voice doesn’t just work. It compounds. Every smooth interaction buys you a little more room to operate next time.
That’s what makes this moment strategic. The winners in voice won’t be the ones with the flashiest demos. They’ll be the ones who understand the terrain. Who pick their battles. Who build credibility one workflow at a time.
Right now, trust is the real bottleneck. And the only way past it is to stop trying to wow people—and start making things that quietly work.
The technical race is over. The trust race has just begun.