Alexa’s new speech recognition abilities showcased at Interspeech

Director of speech recognition Shehzad Mevawalla highlights recent advances in on-device processing, speaker ID, and semi-supervised learning.

At this year’s Interspeech, the largest annual conference on speech-related technologies, Shehzad Mevawalla, the director of automatic speech recognition for Alexa, will deliver a keynote address on “successes, challenges, and opportunities for speech technology in conversational agents.”

Shehzad Mevawalla
Shehzad Mevawalla, director of automatic speech recognition for Alexa.

Ironically, Mevawalla is, in his own description, “not a speech person”.

“I'm a software engineer by trade and by training,” Mevawalla say. “When I started with Amazon, I ran Amazon's business intelligence. Following that, I was running seller performance for third-party sellers, making sure that they were providing a great experience for our customers. Then I spent the next four years in inventory management, where my systems were basically purchasing all of Amazon's inventory worldwide and figuring out how to get it into our warehouses.”

That experience with large-scale systems made Mevawalla an attractive candidate to lead the Alexa speech recognition engineering team, a responsibility he assumed three years ago.

“Alexa has tens of millions of devices out there, and with that kind of scale it’s definitely a challenge,” says Mevawalla. “For example, Christmas morning, millions of people unwrap Alexa devices. They all fire up at the same time. You get a tremendous spike of activity, and you cannot disappoint customers on Christmas morning. So your systems have to be able to scale. 

“And it has to be always on, always reliable. You cannot miss an alarm. You cannot miss a timer. And we continue to add more functionality. So, for example, we added the ability for customers to whisper, and Alexa will whisper back. That cannot add latency. A couple of years ago, we added speaker ID, so we can personalize customers’ Alexa experiences. Again, that has to run with no added latency. Another example is automatic language switching, so you can set your US device to dual languages — for example, an English-Spanish mix. We have to detect whether the customer is speaking English or Spanish in real time and respond in kind. You can't say, ‘Okay, let me listen to this person first for a while, and let's see if they are speaking English or Spanish.’ You have to make the decision immediately.”

Going end-to-end

Mevawalla was no stranger to machine learning when he joined the speech recognition effort. Among other things, his work with Amazon’s third-party sellers involved fraud detection, a problem he had worked on for nine years before joining Amazon and one that depends heavily on machine learning models. Two years ago, his responsibilities expanded to include Alexa speech recognition’s scientific research.

Even in that short time he’s led the science team, Mevawalla says, he’s seen dramatic changes. “I think we’ve made huge strides in the last couple of years,” he says. “For example, we can now run full-capability speech recognition on-device. The models that used to be many gigabytes in size, required huge amounts of memory, and ran on massive servers in the cloud — we're now able to take those models and shrink them into tiny footprints and fit them into devices that are no larger than a tin can.”

In large part, Mevawalla explains, that’s because of a move to end-to-end models, neural networks that take acoustic speech signals as input and directly output transcribed speech. In the past, by contrast, Alexa’s speech recognizers had specialized components that processed inputs in sequence, such as an acoustic model and a language model.

“The acoustic model is trying to recognize the phonemes that are coming in — the basic units of speech,” Mevawalla explains. “Then the language model would do a search using n-grams” — sequences of between three and five words. “You could have multiple combinations of sentences that are possibilities given a string of phonemes,” he says. “Which path do you go down? These large search trees are built by processing large amounts of text data and noting the probability that words occur in a particular order.

“With an end-to-end model, you no longer have the overhead of the various separate components, or those giant search trees. Instead, you now have a full neural representation, which by itself reduces the size of the model to one-hundredth the size. Various quantization techniques minimize the memory and compute footprint even further without losing any accuracy. These models can then be deployed on our devices and executed using our own Amazon neural processor, AZ1 — a neural accelerator that is optimized to run DNNs [deep neural networks].”

Speaker ID

Alexa’s speaker ID function — which recognizes which customer is speaking, so that Alexa can personalize responses — has also moved to an end-to-end model, Mevawalla says.

“We've made a lot of huge improvements,” he says. “We have a two-model approach to speaker ID, where we combine text-dependent and text-independent models. The text-dependent model knows what you're saying ahead of time, so it can match it. With Alexa, most utterances start with the wake word, whether it be ‘Alexa’, ‘Computer’, or ‘Amazon’. So it matches the way you say ‘Alexa’ with the way you said ‘Alexa’ before. Then there's the text-independent model, which matches your voice independent of what you're saying. Moving both these systems to full neural and combining them has given us an order of magnitude improvement in speaker identification systems.”

Mevawalla also points to the speech recognition team’s expanded use of semi-supervised learning (SSL), in which a machine learning model trained on a small body of annotated data itself labels a much larger body of data, which is in turn used for additional training — either of the original model or of a lighter, faster model.

“The volumes of data that we can process using SSL is something we've enhanced over the last year,” Mevawalla says. “Language pooling, when combined with SSL, is another technique that we have leveraged very effectively. And that’s completely non-reviewed, unannotated data that a machine transcribed.”

At this year’s Interspeech, “I'm really excited to share all the innovations that have happened within Alexa,” Mevawalla adds. “I think we've moved the needle in a lot of really new ways.”

Research areas

Related content

ES, M, Madrid
Amazon's International Technology org in EU (EU INTech) is creating new ways for Amazon customers discovering Amazon catalog through new and innovative Customer experiences. Our vision is to provide the most relevant content and CX for their shopping mission. We are responsible for building the software and machine learning models to surface high quality and relevant content to the Amazon customers worldwide across the site. The team, mainly located in Madrid Technical Hub, London and Luxembourg, comprises Software Developer and ML Engineers, Applied Scientists, Product Managers, Technical Product Managers and UX Designers who are experts on several areas of ranking, computer vision, recommendations systems, Search as well as CX. Are you interested on how the experiences that fuel Catalog and Search are built to scale to customers WW? Are interesting on how we use state of the art AI to generate and provide the most relevant content? Key job responsibilities We are looking for Applied Scientists who are passionate to solve highly ambiguous and challenging problems at global scale. You will be responsible for major science challenges for our team, including working with text to image and image to text state of the art models to scale to enable new Customer Experiences WW. You will design, develop, deliver and support a variety of models in collaboration with a variety of roles and partner teams around the world. You will influence scientific direction and best practices and maintain quality on team deliverables. We are open to hiring candidates to work out of one of the following locations: Madrid, M, ESP
US, WA, Bellevue
Imagine being part of an agile team where your ideas have the potential to reach millions of customers. Picture working on cutting-edge, customer-facing solutions, where every team member is a critical voice in the decision making process. Envision being able to leverage the resources of a Fortune 500 company within the atmosphere of a start-up. Welcome to Amazon’s NCRC team. We solve complex problems in an ambiguous space, focusing on reducing return costs and improving the customer experience. We build solutions that are distributed on a large scale, positively impacting experiences for our customers and sellers. Come innovate with the NCRC team! The Net Cost of Refunds and Concessions (NCRC) team is looking for a Senior Manager Data Science to lead a team of economists, business intelligence engineers and business analysts who investigate business problems, develop insights and build models & algorithms that predict and quantify new opportunity. The team instigates and productionalizes nascent solutions around four pillars: outbound defects, inbound defects, yield optimization and returns reduction. These four pillars interact, resulting in impacts to our overall return rate, associated costs, and customer satisfaction. You may have seen some downstream impacts of our work including Amazon.com customer satisfaction badges on the website and app, new returns drop off optionality, and faster refunds for low cost items. In this role, you will set the science vision and direction for the team, collaborating with internal stakeholders across our returns and re-commerce teams to scale and advance science solutions. This role is based in Bellevue, WA Key job responsibilities * Single threaded leader responsible for setting and driving science strategy for the organization. * Lead and provide coaching to a team of Scientists, Economists, Business Intelligence Engineers and Business Analysts. * Partner with Engineering, Product and Machine Learning leaders to deliver insights and recommendations across NCRC initiatives. * Lead research and development of models and science products powering return cost reduction. * Educate and evangelize across internal teams on analytics, insights and measurement by writing whitepapers, knowledge documentation and delivering learning sessions. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, WA, Bellevue
We are designing the future. If you are in quest of an iterative fast-paced environment, where you can drive innovation through scientific inquiry, and provide tangible benefit to hundreds of thousands of our associates worldwide, this is your opportunity. Come work on the Amazon Worldwide Fulfillment Design & Engineering Team! We are looking for an experienced and Research Scientist with background in Ergonomics and Industrial Human Factors, someone that is excited to work on complex real-world challenges for which a comprehensive scientific approach is necessary to drive solutions. Your investigations will define human factor / ergonomic thresholds resulting in design and implementation of safe and efficient workspaces and processes for our associates. Your role will entail assessment and design of manual material handling tasks throughout the entire Amazon network. You will identify fundamental questions pertaining to the human capabilities and tolerances in a myriad of work environments, and will initiate and lead studies that will drive decision making on an extreme scale. .You will provide definitive human factors/ ergonomics input and participate in design with every single design group in our network, including Amazon Robotics, Engineering R&D, and Operations Engineering. You will work closely with our Worldwide Health and Safety organization to gain feedback on designs and work tenaciously to continuously improve our associate’s experience. Key job responsibilities - Collaborating and designing work processes and workspaces that adhere to human factors / ergonomics standards worldwide. - Producing comprehensive and assessments of workstations and processes covering biomechanical, physiological, and psychophysical demands. - Effectively communicate your design rationale to multiple engineering and operations entities. - Identifying gaps in current human factors standards and guidelines, and lead comprehensive studies to redefine “industry best practices” based on solid scientific foundations. - Continuously strive to gain in-depth knowledge of your profession, as well as branch out to learn about intersecting fields, such as robotics and mechatronics. - Travelling to our various sites to perform thorough assessments and gain in-depth operational feedback, approximately 25%-50% of the time. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, NY, New York
Amazon Advertising exists at the intersection of marketing and e-commerce and offers advertisers a rich array of innovative advertising solutions across Amazon-owned and third party properties. We believe that advertising, when done well, can greatly enhance the value of the customer experience and generate a positive return on investment for our advertising partners. We are currently looking for a highly skilled and motivated Data Scientist to help scale our growing advertising business. The Data Scientist is a key member of the Global Marketing Insights team at Amazon Ads, working with marketing, product, retail and other Amazon business partners to analyze and improve advertisers’ performance on Amazon, in support of their marketing objectives. You will work with Amazon's unique data and translate it into high-quality and actionable insights and recommendations to improve the effectiveness of advertiser campaigns and unlock business opportunities. Day to day activities include analyzing advertiser behaviors to develop data-driven insights on what tactics and strategies lead to success. You will also build automated solutions to generate science driven insights at scale, that are distributed to our advertisers across channels. Basic qualifications - Bachelor's or Master's degree in Engineering, Statistics, Economics, or a related technical field - Proven experience in data analytics or data science roles - Proficiency with SQL and Python - Strong understanding of basic statistical techniques and methodologies such as distributions, hypothesis testing, regressions, experimentation, A/B Testing etc. - Excellent organizational, interpersonal, and communication skills (both written and verbal) - Ability to work cross-functionally and with technical and non-technical stakeholders Preferred qualifications - Understanding of advanced statistical techniques and methodologies such as causal inference, propensity score matching, machine learning etc. - Experience with developing and deploying production machine learning models, especially on cloud platforms - Experience building and managing data pipelines - Experience with digital advertising products, performance analytics , marketing and advertising campaigns - MBA, Master’s, or Doctoral degree in Economics, Engineering, Marketing, Statistics, Advertising, or related fields - Publication track record/writing experience (ex. published a paper in a technical journal or trade publication) About the team The Marketing Insights team is responsible for delivering science backed insights to millions of advertisers via our marketing messages. Our team is distributed across the globe and is building cutting edge data science to identify and communicate the impact of various advertising strategies for our products. We are open to hiring candidates to work out of one of the following locations: New York, NY, USA
US, WA, Seattle
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Python is necessary, and experience with SQL and Scala would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. We are open to hiring candidates to work out of one of the following locations: Chicago, IL, USA | Seattle, WA, USA | Washington, DC, USA
US, WA, Seattle
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Python is necessary, and experience with SQL and Scala would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. We are open to hiring candidates to work out of one of the following locations: Chicago, IL, USA | Seattle, WA, USA | Washington, DC, USA
US, WA, Seattle
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Python is necessary, and experience with SQL and Scala would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. We are open to hiring candidates to work out of one of the following locations: Chicago, IL, USA | Seattle, WA, USA | Washington, DC, USA
US, CA, Santa Clara
Machine learning (ML) has been strategic to Amazon from the early years. We are pioneers in areas such as recommendation engines, product search, eCommerce fraud detection, and large-scale optimization of fulfillment center operations. The Generative AI team helps AWS customers accelerate the use of Generative AI to solve business and operational challenges and promote innovation in their organization. As an applied scientist, you are proficient in designing and developing advanced ML models to solve diverse challenges and opportunities. You will be working with terabytes of text, images, and other types of data to solve real-world problems. You'll design and run experiments, research new algorithms, and find new ways of optimizing risk, profitability, and customer experience. We’re looking for talented scientists capable of applying ML algorithms and cutting-edge deep learning (DL) and reinforcement learning approaches to areas such as drug discovery, customer segmentation, fraud prevention, capacity planning, predictive maintenance, pricing optimization, call center analytics, player pose estimation, event detection, and virtual assistant among others. AWS Sales, Marketing, and Global Services (SMGS) is responsible for driving revenue, adoption, and growth from the largest and fastest growing small- and mid-market accounts to enterprise-level customers including public sector. The AWS Global Support team interacts with leading companies and believes that world-class support is critical to customer success. AWS Support also partners with a global list of customers that are building mission-critical applications on top of AWS services. Key job responsibilities The primary responsibilities of this role are to: Design, develop, and evaluate innovative ML models to solve diverse challenges and opportunities across industries Interact with customer directly to understand their business problems, and help them with defining and implementing scalable Generative AI solutions to solve them Work closely with account teams, research scientist teams, and product engineering teams to drive model implementations and new solutions About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. We are open to hiring candidates to work out of one of the following locations: San Francisco, CA, USA | Santa Clara, CA, USA
US, WA, Bellevue
Amazon.com Services, Inc. is looking for a motivated individual with strong analytical skills and practical experience to join our Modeling and Optimization team. We are hiring specialists into our scientific team with expertise in network and combinatorial optimization, simulation-based design, and/or control theory. Amazon is growing rapidly and because we are driven by faster delivery to customers, a more efficient supply chain network, and lower cost of operations, our main focus is in the development of analytical strategic models and automation tools fed by massive amounts of data. You will be responsible for building these models/tools that improve the economics of Amazon’s worldwide fulfillment networks in North America, Europe, and Japan, China, and India as Amazon increases the speed and decreases the cost to deliver products to customers. You will identify and evaluate opportunities to reduce variable costs by improving fulfillment center processes, transportation operations and scheduling, and the execution to operational plans. You will also improve the efficiency of capital investment by helping the fulfillment centers to improve storage utilization and the effective use of automation. Finally, you will help create the metrics to quantify improvements to the fulfillment costs (e.g., transportation and labor costs) resulting from the application of these optimization models and tools. The ideal candidate will have good communication skills with both technical and business people with ability to speak at a level appropriate for the audience. Key job responsibilities * Understand ambiguous business problems, model it in the simplest and most effective manner with limited guidance. * Use new or existing tools to support internal partner-teams and provide the best, science-based guidance. * Contribute to existing tools with highly disciplined coding practices. * Contribute to the growth of knowledge of our team and the scientific community with internal and external publications or presentations. About the team * At the Modeling and Optimization (MOP) team, we use optimization, algorithm design, statistics, and machine learning to improve decision-making capabilities across WW Operations and Amazon Logistics. * We focus on transportation topology, labor and resource planning, routing science, visualization research, data science and development, and process optimization. * We create models to simulate, optimize, and control the fulfillment network with the objective of reducing cost while improving speed and reliability. * We support multiple business line, therefore maintain a comprehensive and objective view, coordinating solutions across organizational lines where possible. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, CA, Santa Clara
Amazon AI is looking for world class scientists and engineers to join its AWS AI. This group is entrusted with developing core natural language processing, generative AI, deep learning and machine learning algorithms for AWS. You will invent, implement, and deploy state of the art machine learning algorithms and systems. You will build prototypes and explore conceptually new solutions. You will interact closely with our customers and with the academic community. You will be at the heart of a growing and exciting focus area for AWS and work with other acclaimed engineers and world famous scientists. A day in the life Inclusive Team Culture Here at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life Balance Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future. About the team The Amazon Web Services (AWS) Next Gen DevX (NGDE) team uses generative AI and foundation models to reimagine the experience of all builders on AWS. From the IDE to web-based tools and services, AI will help engineers work on large and small applications. We explore new technologies and find creative solutions. Curiosity and an explorative mindset can find a place here to impact the life of engineers around the world. If you are excited about this space and want to enlighten your peers with new capabilities, this is the team for you. We are open to hiring candidates to work out of one of the following locations: Santa Clara, CA, USA