New York Tech Media
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital
No Result
View All Result
New York Tech Media
No Result
View All Result
Home Benzinga

From AI Assistants to Self-Driving: The Future of Voice AI with expert Manoj Boopathi Raj

New York Tech Editorial Team by New York Tech Editorial Team
August 26, 2024
in Benzinga
0
Humanizing B2B Marketing in the AI Era: Sahil Sethi’s “Product Marketing Level Up”
Share on FacebookShare on Twitter

Written by Jane Clarkson

You tell your phone to set a reminder, ask your smart speaker to change the song, or speak a movie title into your remote—no more tapping through tedious menus and on-screen keyboards. Voice-user interfaces (VUI) have transitioned from niche alternatives to essential components of consumer technology. But when did this transformation begin? What’s next for the industry, where does artificial intelligence (AI) come into play, and how will it impact consumers? To answer these questions, we turn to Voice AI engineering expert Manoj Boopathi Raj, a Senior Software Engineer at Google. With a decade of experience driving breakthroughs in voice recognition and VUI, Mr. Boopathi Raj offers a unique perspective on this technological shift.

Please tell us a little more about yourself and how AI has played a role in your career.

I’ve always been fascinated by how quickly technology is molded by our needs—and AI is perhaps the best example. Many people don’t realize that long before large language models (LLMs) became headline news, we were already using machine learning, only in much more mundane ways. Any software that needed to scale and operate with minimal human intervention likely has some AI in its engineering. Optimizing Google Fi cell network coverage, classifying spam uploads on YouTube—I’ve had the privilege of proposing and leading projects which have been employing these algorithmic solutions for many years now. The current race to create more intuitive interfaces is a natural progression, and it’s thrilling to be part of it.

OpenAI’s GTP-4o comes to mind. The voice module created quite the buzz when they expressed concerns about creating “emotional reliance” because of how effective it is. Can you shed a little more light on the history of voice-user interfaces?

Absolutely. This is just the latest milestone in a journey that began decades ago. It seems primitive now, but older readers will remember the earliest use of VUI, specifically for “AI,” was dictation software in the late ’90s, like Dragon NaturallySpeaking—products where you spoke, the computer listened, then interpreted the input into text. We don’t think back on them today because they were clunky, demanded unnaturally slow, deliberate enunciation, and required a lot of intervention—that is, they failed to address the needs of their users. It wasn’t until a decade later, with the introduction of Apple’s Siri, and GA, that VUI began to meet those needs more effectively. Fast forward to today, and VUI systems like Google Assistant have become indispensable, expected to handle complex tasks with near-perfect accuracy. The recent excitement around OpenAI’s GPT-4o reminded both technologists and consumers how compelling it is to communicate with our technology in the same way we communicate with one another—through voice.

As a senior engineer on the Google Assistant team, did these big leaps in VUI become apparent in your work?

Google Assistant is a perfect case study for the evolution of VUI. While I’ve led efforts on the mobile side of the product, namely improvements in its natural language processing (NLP) abilities, my work automotive environments is where the potential impact of VUI becomes most apparent. Some might be more familiar with the name Android Auto.

We were initially faced with all the distractions and noises a human driver might be used to: Engine sounds, passenger conversations, their own music—much more disruptive for a machine, which depends on clean audio signals. I first focused on developing a robust data collection infrastructure—because exhaustive big data is everything when it comes to training AI—and then on fine-tuning the speech models, so the VUI could handle every kind of imaginable scenario. The result was a spectacular 50% average improvement in word error rates across six languages, and that was just one initiative.

The Android Automotive OS system, which is now installed in over 200 million cars worldwide, really shows how VUI is becoming indispensable in everyday tech. This isn’t just about convenience; it’s about ensuring that these systems function reliably in the real world, where the stakes—especially in automotive applications—are incredibly high. Android Auto’s VUI is being featured in leading OEMs, so the market will continue to see the technology deployed, if not expanded upon, in the coming years.

On that note, what are your thoughts on manufacturers moving towards more voice user interfaces (VUI)? Or, phrased differently, how do you see VUI affecting consumers as it becomes the new standard?

The trend towards VUI coincides, and not accidentally, with autonomous vehicles and the prevalence of LLMs. Trusting an AI to drive a car is a bold proposition, and VUI may play a critical role in that trust, perhaps in the not-too-distant future of the automotive industry. Consumers need to feel confident that the AI can accurately listen to both the environment and their commands, understand context, and make decisions as well as—or better than—a human driver. The skepticism is reasonable, and the stakes are high—if one system fails, it can undermine trust in the entire industry. That trust is very hard to earn back.

The way I see it, VUI has the potential to bridge the gap between current skepticism and future trust in AI. As these systems become more robust, they provide an opportunity to introduce AI in discrete ways that feel natural and intuitive. We want to reduce the learning curve and build consumer confidence. As an added bonus, they also offer an unprecedented chance to create accessibility options, without requiring users to adapt to new, unfamiliar interfaces—simply say what you need or ask a question. Even if we never fully hand over the wheel, VUI provides an interface that makes the technology behind it more approachable.

Looking even further ahead, the potential for VUI goes beyond driving or smart home commands. VUI and AI together could revolutionize how we access information and receive personalized services. Healthcare is a potent example: VUI could empower patients to manage their immediate needs through voice commands, reducing the barriers to accessing critical information or immediate support. In education, it could create new opportunities for learning, providing students with personalized, voice-driven tutoring and feedback, tailored to individual needs. The possibilities are there, if we’re prepared to make them a reality. The biggest challenge for the industry today is making the technology more reliable and human-centric. It isn’t that people are afraid of speaking to their tech—they’re just afraid of not being heard.

Readers can see some of the work you’re doing to make technical information more accessible on Hackernoon and DZone. Thank you for your time, Mr. Boopathi Raj. We look forward to seeing the industry evolve with engineers like yourself paving the way.

Thank you. I’m excited to be part of this journey, and I look forward to what the future says.

Previous Post

Humanizing B2B Marketing in the AI Era: Sahil Sethi’s “Product Marketing Level Up”

Next Post

Top Cybersecurity Certifications for Professionals in 2024

New York Tech Editorial Team

New York Tech Editorial Team

New York Tech Media is a leading news publication that aims to provide the latest tech news, fintech, AI & robotics, cybersecurity, startups & leaders, venture capital, and much more!

Next Post
Top Cybersecurity Certifications for Professionals in 2024

Top Cybersecurity Certifications for Professionals in 2024

  • Trending
  • Comments
  • Latest
Meet the Top 10 K-Pop Artists Taking Over 2024

Meet the Top 10 K-Pop Artists Taking Over 2024

March 17, 2024
Panther for AWS allows security teams to monitor their AWS infrastructure in real-time

Many businesses lack a formal ransomware plan

March 29, 2022
Zach Mulcahey, 25 | Cover Story | Style Weekly

Zach Mulcahey, 25 | Cover Story | Style Weekly

March 29, 2022
How To Pitch The Investor: Ronen Menipaz, Founder of M51

How To Pitch The Investor: Ronen Menipaz, Founder of M51

March 29, 2022
Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

Japanese Space Industry Startup “Synspective” Raises US $100 Million in Funding

March 29, 2022
UK VC fund performance up on last year

VC-backed Aerium develops antibody treatment for Covid-19

March 29, 2022
Startups On Demand: renovai is the Netflix of Online Shopping

Startups On Demand: renovai is the Netflix of Online Shopping

2
Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

Robot Company Offers $200K for Right to Use One Applicant’s Face and Voice ‘Forever’

1
Menashe Shani Accessibility High Tech on the low

Revolutionizing Accessibility: The Story of Purple Lens

1

Netgear announces a $1,500 Wi-Fi 6E mesh router

0
These apps let you customize Windows 11 to bring the taskbar back to life

These apps let you customize Windows 11 to bring the taskbar back to life

0
This bipedal robot uses propeller arms to slackline and skateboard

This bipedal robot uses propeller arms to slackline and skateboard

0
Coffee Nova’s $COFFEE Token

Coffee Nova’s $COFFEE Token

May 29, 2025
Money TLV website

BridgerPay to Spotlight Cross-Border Payments Innovation at Money TLV 2025

May 27, 2025
The Future of Software Development: Why Low-Code Is Here to Stay

Building Brand Loyalty Starts With Your Team

May 23, 2025
Tork Media Expands Digital Reach with Acquisition of NewsBlaze and Buzzworthy

Creative Swag Ideas for Hackathons & Launch Parties

May 23, 2025
Tork Media Expands Digital Reach with Acquisition of NewsBlaze and Buzzworthy

Strengthening Cloud Security With Automation

May 22, 2025
How Local IT Services in Anderson Can Boost Your Business Efficiency

Why VPNs Are a Must for Entrepreneurs in Asia

May 22, 2025

Recommended

Coffee Nova’s $COFFEE Token

Coffee Nova’s $COFFEE Token

May 29, 2025
Money TLV website

BridgerPay to Spotlight Cross-Border Payments Innovation at Money TLV 2025

May 27, 2025
The Future of Software Development: Why Low-Code Is Here to Stay

Building Brand Loyalty Starts With Your Team

May 23, 2025
Tork Media Expands Digital Reach with Acquisition of NewsBlaze and Buzzworthy

Creative Swag Ideas for Hackathons & Launch Parties

May 23, 2025

Categories

  • AI & Robotics
  • Benzinga
  • Cybersecurity
  • FinTech
  • New York Tech
  • News
  • Startups & Leaders
  • Venture Capital

Tags

3D bio-printing acoustic AI Allseated B2B marketing Business carbon footprint climate change coding Collaborations Companies To Watch consumer tech crypto cryptocurrency deforestation drones earphones Entrepreneur Fetcherr Finance Fintech food security Investing Investors investorsummit israelitech Leaders LinkedIn Leaders Metaverse news OurCrowd PR Real Estate reforestation software start- up Startups Startups On Demand startuptech Tech Tech leaders technology UAVs Unlimited Robotics VC
  • Contact Us
  • Privacy Policy
  • Terms and conditions

© 2024 All Rights Reserved - New York Tech Media

No Result
View All Result
  • News
  • FinTech
  • AI & Robotics
  • Cybersecurity
  • Startups & Leaders
  • Venture Capital

© 2024 All Rights Reserved - New York Tech Media