<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>TTS on Tom Burkert</title><link>https://blog.burkert.me/tags/tts/</link><description>Recent content in TTS on Tom Burkert</description><image><title>Tom Burkert</title><url>https://blog.burkert.me/assets/</url><link>https://blog.burkert.me/assets/</link></image><generator>Hugo -- 0.148.0</generator><language>en-us</language><lastBuildDate>Sun, 04 Jan 2026 23:46:37 +0100</lastBuildDate><atom:link href="https://blog.burkert.me/tags/tts/index.xml" rel="self" type="application/rss+xml"/><item><title>Why voice is not the user interface of the future</title><link>https://blog.burkert.me/posts/voice_future_interface/</link><pubDate>Sun, 04 Jan 2026 23:46:37 +0100</pubDate><guid>https://blog.burkert.me/posts/voice_future_interface/</guid><description>&lt;p>In the last few months, various prominent figures in big tech have claimed that voice is the user interface of the future. It&amp;rsquo;s an interesting revival of voice interaction, which already fell flat on its face in the late 2010s during the era of voice assistants. Despite that failure, the idea is back, and I&amp;rsquo;m here to tell you why voice is &lt;strong>not&lt;/strong> the user interface of the future.&lt;/p>
&lt;p>Let&amp;rsquo;s start with why I&amp;rsquo;m so peeved about this. I almost wrote this blog post back in October, when David Weston, a VP at Microsoft, &lt;a href="https://youtu.be/ccpXNBsTaGk" target="_blank" rel="noopener">claimed&lt;/a> that &amp;ldquo;the world of mousing around and keyboarding around and typing will feel as alien as it does to Gen Z to use DOS.&amp;rdquo; This was part of the unveiling of the Microsoft Windows 2030 Vision, and he also suggested that we will do &amp;ldquo;more talking to our computers.&amp;rdquo; I have some strong feelings about Microsoft, Windows, and their vision for the future, but I won&amp;rsquo;t go into those to avoid getting derailed. At the time, I shrugged it off as one of those &lt;em>things people just say on the internet&lt;/em> or a marketing gimmick.&lt;/p></description><content:encoded><![CDATA[<p>In the last few months, various prominent figures in big tech have claimed that voice is the user interface of the future. It&rsquo;s an interesting revival of voice interaction, which already fell flat on its face in the late 2010s during the era of voice assistants. Despite that failure, the idea is back, and I&rsquo;m here to tell you why voice is <strong>not</strong> the user interface of the future.</p>
<p>Let&rsquo;s start with why I&rsquo;m so peeved about this. I almost wrote this blog post back in October, when David Weston, a VP at Microsoft, <a href="https://youtu.be/ccpXNBsTaGk" target="_blank" rel="noopener">claimed</a> that &ldquo;the world of mousing around and keyboarding around and typing will feel as alien as it does to Gen Z to use DOS.&rdquo; This was part of the unveiling of the Microsoft Windows 2030 Vision, and he also suggested that we will do &ldquo;more talking to our computers.&rdquo; I have some strong feelings about Microsoft, Windows, and their vision for the future, but I won&rsquo;t go into those to avoid getting derailed. At the time, I shrugged it off as one of those <em>things people just say on the internet</em> or a marketing gimmick.</p>
<p>Fast forward to January 2026, and we have reports that <a href="https://techcrunch.com/2026/01/01/openai-bets-big-on-audio-as-silicon-valley-declares-war-on-screens/" target="_blank" rel="noopener">OpenAI bets big on audio as Silicon Valley declares war on screens</a>, which outlines OpenAI&rsquo;s ambition to create a personal audio-first device (<a href="https://www.cnet.com/tech/mobile/humanes-ai-pin-failed-because-it-ignored-what-was-already-in-our-pockets/" target="_blank" rel="noopener">rings</a> a <a href="https://www.wired.com/review/rabbit-r1/" target="_blank" rel="noopener">bell</a>?) and claims that the entire industry is headed towards &ldquo;a future where screens become background noise and audio takes center stage.&rdquo; Mind you, this is Connie Loizos of TechCrunch speaking, not Sam Altman. But he is quite bullish on <a href="https://sequoiacap.com/podcast/sam-altman-training-data/" target="_blank" rel="noopener">voice interaction</a> and the article cites several other companies making similar bets. It&rsquo;s on.</p>
<h2 id="voice-input-can-be-actually-useful">Voice input can be actually useful</h2>
<p>Let&rsquo;s give some nuance to my position. I&rsquo;m not a total voice interaction hater. I barely use voice input myself, but I love listening to audiobooks and podcasts. I also think TTS with a high-quality voice is a great way of &ldquo;reading&rdquo; longer articles. But let&rsquo;s focus on the voice input side, which I think is the most contentious.</p>
<p>There are many legitimate and highly useful ways to use voice input. These applications mostly center around everyday, low-stakes tasks, such as setting reminders and alarms, playing songs, or controlling smart home devices. These are all tasks that most traditional voice assistants do reasonably well, and having a multimodal LLM as the brain definitely expands the possibilities for general inquiries and trivia, while also making the interactions a little less stilted. So far so good.</p>
<p>Another area where voice input is useful, and I&rsquo;d even say potentially life-saving, is hands-free operation: typically while driving, but also for people with disabilities that rely on voice commands for most or all of their interactions with computers. More broadly, and despite my reservations about <a href="https://blog.burkert.me/posts/llm_deanthropomorphization/" target="_blank" rel="noopener">making AI models too human-like</a>, I&rsquo;ll admit that modern voice-enabled chat applications have become quite usable for casual questions and quick lookups.</p>
<p>So what&rsquo;s my beef with it?</p>
<h2 id="when-voice-doesnt-work">When voice doesn&rsquo;t work</h2>
<p>Even in everyday consumer contexts, voice can be impractical or even impossible to use. Talking to your phone in a crowded place or on public transport is a bad experience for both you and everyone around. You might not want everyone to hear what you&rsquo;re searching for or replying to your spouse or a friend. In many situations, background noise can make it outright impossible to use voice commands effectively, and so can device or model limitations around different languages or accents.</p>
<p>It can also be a privacy nightmare. I couldn&rsquo;t find reliable recent sources, but household penetration of smart speakers in the US is usually estimated at around 30%. This means that millions of people are already comfortable putting always-listening devices in their living rooms, kitchens, and in some cases bedrooms as well. I&rsquo;m not among them, and I suspect a large portion of the population will be cautious about letting these devices into their personal spaces.</p>
<p>All the reasons above are hurdles to adoption, but unless we have more <a href="https://www.bbc.com/news/articles/cr4rvr495rgo" target="_blank" rel="noopener">high-profile scandals and lawsuits</a>, I imagine the voice assistants and smart speakers will continue to grow, albeit slowly and definitely not as the primary means of communication or input.</p>
<h2 id="professional-environment-is-where-it-falls-apart">Professional environment is where it falls apart</h2>
<p>The major problem for the voice-first future is the professional environment. It is one thing to bark a few commands at Alexa or Siri at home, and another thing entirely to use it in your 9-5. Especially if it&rsquo;s supposed to be the primary (or even only?) input method. I would call voice input actively bad in most workplace situations.</p>
<p>The voice-first future is primarily aimed at knowledge workers. Many of them already spend good portions of their day on Zoom/Teams calls. I have had many days with 6-8 hours of calls, and I can tell you: it sucks. Not just because of so-called video call fatigue, but also because it takes a toll on your vocal cords. You know who else spends the majority of their workday talking? Teachers, who are <a href="https://boxlight.com/latest-stories/blog-articles/vocal-strain-in-teachers-leads-to-chronic-voice-disorders%E2%80%94but-it%E2%80%99s-preventable" target="_blank" rel="noopener">much more likely</a> to develop voice-related health issues, and for whom talking is considered an <a href="https://www.nea.org/nea-today/all-news-articles/teacher-voice-problems-are-occupational-hazard-heres-how-reduce-risk" target="_blank" rel="noopener">occupational hazard</a>. A voice-first future sounds like it would have most of us talking as much as teachers do. And I can&rsquo;t imagine doing more talking after a few hours of video calls - I&rsquo;m glad I can just type away! Voice fatigue is a serious concern for this vision.</p>
<p>But it&rsquo;s also about the effect on others: open-space offices are already a bad working environment, but can you imagine an open office where everyone is speaking at their computer the whole day? And what about information confidentiality, whether it&rsquo;s in your company&rsquo;s office, or even worse, in a co-working hub or other shared space?</p>
<p>Your company&rsquo;s information security policies probably already limit how and where you can use your company devices (such as laptops). It usually explicitly forbids employees from displaying confidential information on their screens in a public setting. Now imagine the same employees having to speak about said confidential information out loud.</p>
<p>And don&rsquo;t forget many tasks are just easier done with a keyboard and/or a mouse. Selecting a file from a list? Resizing a picture? Selecting a part of a larger text? Good luck getting faster than with your mouse. And while speaking is faster for most people than typing, the situation gets more complicated once you start making edits to your text. There could be a future where AI agents handle the small details of tasks and an average knowledge worker just provides general feedback, but that future is probably far away for most fields. Until then, voice is just not practical for a lot of the stuff we do on our computers. Smartphones completely upended how we interact with tech, but they didn&rsquo;t kill keyboards; they just made them virtual and on-screen.</p>
<p>Does this mean voice input will never be adopted in a professional environment? Of course not; voice transcription, meeting assistants, and accessibility tools are already in use. But I don&rsquo;t expect the workplace to shift from keyboards to microphones anytime soon.</p>
<h2 id="what-if-voice-assistants-were-as-good-as-they-could-be">What if voice assistants were as good as they could be?</h2>
<p>A common explanation of the failure of voice assistants is that they were simply not capable enough. And the little they could do, they could not do reliably. But this may not be the whole story. What if voice assistants also flopped because they simply could not be adopted in other settings and for other tasks? What if there are inherent limitations to what people can and want to do using their voice?</p>
<p>My list of reasons against voice input is far from complete, but they are serious enough to prevent voice from becoming the primary means of interaction. This definitely holds true in a professional environment, but I&rsquo;ll admit there is still room for voice in personal and low-stake settings where its convenience can outweigh the drawbacks. The problem is that technology that is useful in consumer space but not in professional space typically does not bring enough interest and revenue for wide investment and adoption. Without professional adoption, voice-first interfaces may end up as a permanent side feature rather than the future of computing we are being promised.</p>
<p>I firmly believe we will eventually evolve beyond keyboards and mice, but I am convinced that it is not happening by 2030, and probably not universally. When David Weston mocks DOS as alien to Gen Z, suggesting DOS-like interfaces are relics, I&rsquo;d like to note that one of the <a href="https://ppc.land/claude-code-reaches-115-000-developers-processes-195-million-lines-weekly/" target="_blank" rel="noopener">most beloved</a> AI tools among developers is Claude Code, which is terminal-based and, well, quite DOS-like. Not everything old has to be bad; on the contrary, if it survived this long, there probably is a good reason for that.</p>
<p>Brain-computer interfaces à la Neuralink are one promising alternative to voice interaction. I find them highly dystopian and creepy, but they do not suffer from most of the issues that voice does. They are likely several years from anything that resembles mass adoption and in the meantime, the privacy concerns about thought-reading will probably only escalate. But that is a story for another time. For now, you will have to pry my beloved mechanical keyboard from my cold, dead hands.</p>
]]></content:encoded></item></channel></rss>