Detección de pronunciación para la nueva experiencia de aprendizaje de inglés de Alexa

El aumento de datos, novedosas funciones de pérdidas y un entrenamiento con poca supervisión permiten crear un modelo innovador para detectar errores de pronunciación.

En enero de 2023, Alexa lanzó en España una experiencia de aprendizaje de idiomas para ayudar a los hispanohablantes a aprender inglés para principiantes. Esta experiencia se desarrolló en colaboración con Vaughan, el principal proveedor de aprendizaje de inglés en España, con el objetivo de ofrecer un programa de aprendizaje de inglés inmersivo centrado en la mejora de la pronunciación.

Ahora estamos ampliando esta oferta a México y a la población de habla hispana de Estados Unidos, y en el futuro planeamos añadir más idiomas. Esta experiencia de aprendizaje de idiomas incluye lecciones estructuradas de vocabulario, gramática, expresión y pronunciación, con ejercicios prácticos y pruebas. Para probarla, configura el idioma de tu dispositivo a español y dile a Alexa "Quiero aprender inglés".

Mini-lesson content page_ES.png
Página de contenidos de lecciones cortas: lecciones de vocabulario, gramática, expresión y pronunciación.

Lo más destacado de esta skill de Alexa es su función de pronunciación, la cual proporciona información precisa cada vez que un cliente pronuncia mal una palabra o frase. En la Conferencia Internacional de Acústica, Habla y Procesamiento de Señales (ICASSP por sus siglas en inglés) de este año, presentamos un artículo en el que describíamos nuestro innovador método de detección de errores de pronunciación.

alexaspeechspanish.jpg
Corrección de pronunciación: El texto en azul indica una pronunciación correcta, mientras que el rojo indica una pronunciación incorrecta. Para frases/palabras pronunciadas incorrectamente, Alexa brindará instrucciones detalladas sobre cómo pronunciarlas.

Nuestro método utiliza un novedoso modelo fonético de redes neuronales recurrentes (RNN-T por sus siglas en inglés) que predice los fonemas, las unidades más pequeñas del habla, a partir de la pronunciación del alumno. Por lo tanto, el modelo puede proporcionar una evaluación detallada de la pronunciación a nivel de palabra, sílaba o fonema. Por ejemplo, si un alumno pronuncia incorrectamente la palabra "rabbit" como "rabid", el modelo mostrará la secuencia de cinco fonemas R AE B IH D. Posteriormente, puede detectar los fonemas (IH D) y la sílaba (-bid) mal pronunciados utilizando la alineación de Levenshtein para comparar la secuencia de fonemas con la secuencia de referencia "R AE B AH T".

El artículo destaca dos brechas de conocimiento que no se habían abordado en anteriores modelos de pronunciación. La primera es la capacidad de distinguir fonemas similares en distintos idiomas (por ejemplo, la "r" rodada en español vs. la "r" en inglés). Para ello, diseñamos un léxico de pronunciación multilingüe y creamos un inmenso conjunto de datos fonéticos mixtos para el programa de aprendizaje.

La otra brecha de conocimiento es la capacidad de aprender patrones únicos de pronunciación errónea de los alumnos de idiomas. Para ello, aprovechamos la autorregresividad del modelo RNN-T, es decir, la dependencia de sus resultados de las entradas y salidas anteriores. Este conocimiento del contexto significa que el modelo puede captar patrones frecuentes de pronunciación errónea a partir de los datos del entrenamiento. Nuestro modelo de pronunciación ha obtenido los mejores resultados tanto en precisión de predicción de fonemas, como de detección de errores de pronunciación.

Aumento de datos L2

Uno de los principales retos técnicos a la hora de crear un modelo de reconocimiento fonético para hablantes no nativos (L2) es que los conjuntos de datos para el diagnóstico de errores de pronunciación son muy limitados. En nuestro artículo de Interspeech 2022 "L2-GEN: Un enfoque neuronal de parafraseo de fonemas para el diagnóstico de errores de pronunciación en síntesis del habla L2", planteamos cerrar esta brecha mediante el incremento de datos. En concreto, creamos un parafraseador de fonemas que puede generar fonemas realistas de L2 para hablantes de un lugar específico, por ejemplo, fonemas que representen a un hablante nativo de español hablando en inglés.

Como es habitual en las tareas de corrección de errores gramaticales, utilizamos un modelo de secuencia a secuencia, pero invertimos la dirección de la tarea entrenando al modelo para pronunciar mal las palabras en lugar de corregir los errores de pronunciación. Además, para enriquecer y diversificar aún más las secuencias de fonemas L2 generados, propusimos un componente de decodificación diversificado y consciente de las preferencias que combina una búsqueda en haz diversificada con una pérdida de preferencia que se inclina hacia los errores de pronunciación similares a los humanos.

Para cada tono de entrada o fragmento del habla, el modelo produce varios fonemas posibles como salidas, y las secuencias de fonemas se modelan como un árbol en el que las posibilidades proliferan con cada nuevo tono. Normalmente, las secuencias de fonemas mejor clasificadas se extraen del árbol mediante las búsquedas en haz que persigue solo las ramas del árbol con las probabilidades más altas. En nuestro trabajo, sin embargo, propusimos un método de búsqueda en haz que da prioridad a los fonemas inusuales, o candidatos a fonema que difieren de la mayoría de los demás en la misma profundidad del árbol.

A partir de fuentes establecidas en la documentación sobre aprendizaje de idiomas, también elaboramos listas de errores de pronunciación comunes a nivel de fonema, representados como pares de fonemas, uno del fonema estándar de la lengua y otro de su variante no estándar. Construimos una función de pérdida que, durante el proceso de aprendizaje del modelo, da prioridad a los resultados que utilizan las variantes no estándar de nuestra lista.

En los experimentos observamos mejoras de precisión de hasta el 5% en la detección de errores de pronunciación con respecto a un modelo de referencia entrenado sin datos adicionales.

Equilibrando el falso rechazo y la falsa aceptación

Una consideración clave a la hora de diseñar un modelo de pronunciación para una experiencia de aprendizaje de idiomas es equilibrar la proporción de falsos rechazos y falsas aceptaciones. Un falso rechazo se produce cuando el modelo de pronunciación detecta un error de pronunciación, pero en realidad el alumno estaba en lo cierto o utilizaba una pronunciación coherente pero ligeramente acentuada. Una falsa aceptación se produce cuando un alumno pronuncia mal una palabra y el modelo no lo detecta.

Nuestro sistema tiene dos características de diseño enfocadas a equilibrar estas dos métricas. Para reducir las falsas aceptaciones, primero combinamos nuestros léxicos de pronunciación estándar para inglés y español en un léxico único con múltiples fonemas correspondientes a cada palabra. Después, utilizamos ese léxico para analizar automáticamente muestras de habla no comentadas que se clasifican en tres categorías: español nativo, inglés nativo y español e inglés codificados. El entrenamiento del modelo con este conjunto de datos le permite distinguir diferencias muy sutiles entre fonemas.

Para reducir los falsos rechazos utilizamos un léxico de pronunciación multirreferencial en el que cada palabra se asocia a varias pronunciaciones de referencia. Por ejemplo, la palabra "data" puede pronunciarse como "day-tah" o "dah-tah" y el sistema aceptará ambas variaciones como correctas.

Actualmente seguimos estudiando varios métodos para mejorar nuestra función de evaluación de la pronunciación. Uno de ellos es la creación de un modelo multilingüe que pueda utilizarse para evaluar la pronunciación en muchos idiomas. También estamos ampliando el modelo para diagnosticar más características de pronunciación errónea, como el tono y el acento léxico.

Research areas

Related content

US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! As an Applied Scientist in the Prime Video Playback Intelligence Organization, you will have deep subject matter expertise in applied machine learning and data science, with specializations in video streaming optimization, information retrieval, anomaly detection and root-causing systems, large language models and generative AI across various modalities. Key job responsibilities - Work with multiple teams of scientists, engineers, and product managers to translate business and functional requirements into concrete deliverables leading strategic efforts to enhance customer quality of experiences. - Work on problems spaces such as: improving the customer playback quality of experience across Video on Demand, Live Events and Linear Content. - Reduce the time/cost/effort to optimize the customer experience as well as detect, root-cause, and mitigate defects in the customer experience. You’ll seek to understand the depth and nuance of streaming video at scale and identify opportunities to grow our business and improve customer quality of experience via principled ML/AI solutions. - Lead integration of new algorithms and processes into existing modeling stacks, simplify and streamline the existing modeling stacks, and develop testing and evaluation strategies. Ultimately, you'll work backwards from the desired outcomes and lead the way on determining the ideal solution (statistical techniques, traditional ML, GenAI, etc). A day in the life We love solving challenging and hard problems in our quest to innovate on behalf of our customers and provide the best video streaming experience. We push the boundaries to leverage and invent technologies which help create unrivaled experiences for our customers to help us move fast in a growing and changing environment. We use data to guide our decisions, work closely with our engineering and product counterparts, and partner with other Science teams as well as academic institutions to learn and guide in an environment of innovation.
IN, KA, Bengaluru
Selection Monitoring team is responsible for making the biggest catalog on the planet even bigger. In order to drive expansion of the Amazon catalog, we develop advanced ML/AI technologies to process billions of products and algorithmically find products not already sold on Amazon. We work with structured, semi-structured and Visually Rich Documents using deep learning, NLP and image processing. The role demands a high-performing and flexible candidate who can take responsibility for success of the system and drive solutions from research, prototype, design, coding and deployment. We are looking for Applied Scientists to tackle challenging problems in the areas of Information Extraction, Efficient crawling at internet scale, developing ML models for website comprehension and agents to take multi-step decisions. You should have depth and breadth of knowledge in text mining, information extraction from Visually Rich Documents, semi structured data (HTML) and advanced machine learning. You should also have programming and design skills to manipulate Semi-Structured and unstructured data and systems that work at internet scale. You will encounter many challenges, including: - Scale (build models to handle billions of pages), - Accuracy (requirements for precision and recall) - Speed (generate predictions for millions of new or changed pages with low latency) - Diversity (models need to work across different languages, market places and data sources) You will help us to - Build a scalable system which can algorithmically extract information from world wide web. - Intelligently cluster web pages, segment and classify regions, extract relevant information and structure the data available on semi-structured web. - Build systems that will use existing Knowledge Base to perform open information extraction at scale from visually rich documents. Key job responsibilities - Use AI, NLP and advances in LLMs/SLMs and agentic systems to create scalable solutions for business problems. - Efficiently Crawl web, Automate extraction of relevant information from large amounts of Visually Rich Documents and optimize key processes. - Design, develop, evaluate and deploy, innovative and highly scalable ML models, esp. leveraging latest advances in RL-based fine tuning methods like DPO, GRPO etc. - Work closely with software engineering teams to drive real-time model implementations. - Establish scalable, efficient, automated processes for large scale model development, model validation and model maintenance. - Lead projects and mentor other scientists, engineers in the use of ML techniques. - Publish innovation in research forums.
BR, SP, Sao Paulo
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Machine Learning team in Mexico City. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning, LLMs and Agentic AI, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Design, implement, and evolve Agentic AI systems that can autonomously perceive their environment, reason about context, and take actions across business workflows—while ensuring human-in-the-loop oversight for high-stakes decisions. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise leadership, both tech and non-tech. - Support technical trade-offs between short-term needs and long-term goals.
US, WA, Bellevue
Alexa International Science team is looking for a passionate, talented, and inventive Senior Applied Scientist to help build industry-leading technology with Large Language Models (LLMs) and multimodal systems, requiring strong deep learning and generative models knowledge. At this level, you will drive cross-team scientific strategy, influence partner teams, and deliver solutions that have broad impact across Alexa's international products and services. Key job responsibilities As a Senior Applied Scientist with the Alexa International team, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art with LLMs, particularly delivering industry-leading scientific research and applied AI for multi-lingual applications — a challenging area for the industry globally. Your work will directly impact our global customers in the form of products and services that support Alexa+. You will leverage Amazon's heterogeneous data sources and large-scale computing resources to accelerate advances in text, speech, and vision domains. The ideal candidate possesses a solid understanding of machine learning, speech and/or natural language processing, modern LLM architectures, LLM evaluation & tooling, and a passion for pushing boundaries in this vast and quickly evolving field. They thrive in fast-paced environment, like to tackle complex challenges, excel at swiftly delivering impactful solutions while iterating based on user feedback, and are able to influence and align multiple teams around a shared scientific vision.
US, CA, San Francisco
Amazon has launched a new research lab in San Francisco to develop foundational capabilities for useful AI agents. We’re enabling practical AI to make our customers more productive, empowered, and fulfilled. Our work leverages large vision language models (VLMs) with reinforcement learning (RL) and world modeling to solve perception, reasoning, and planning to build useful enterprise agents. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. Key job responsibilities You will contribute directly to AI agent development in an applied research role to improve the multi-model perception and visual-reasoning abilities of our agent. Daily responsibilities including model training, dataset design, and pre- and post-training optimization. You will be hired as a Member of Technical Staff.
US, NY, New York
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to apply their structural econometrics skillsets to solve real world problems. The intern will work in the area of Amazon Private Brands and develop models to improve our product selection. Our PhD Economist Internship Program offers hands-on experience in applied economics, supported by mentorship, structured feedback, and professional development. Interns work on real business and research problems, building skills that prepare them for full-time economist roles at Amazon and beyond. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. About the team The Amazon Private Brands science advance team applies Machine Learning, Statistics and Econometrics/economics to solve high-impact business problems, develop prototypes for Amazon-scale science solutions, and optimize key business functions of Amazon Private Brands and other Amazon orgs. We are an interdisciplinary team, using science and technology and leveraging the strengths of engineers and scientists to build solutions for some of the toughest business problems at Amazon, covering areas such as pricing, discovery, negotiation, forecasting, supply chain and product selection/development.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best.
US, TX, Austin
Amazon Security is looking for a talented and driven Applied Scientist II to spearhead Generative AI acceleration within the Secure Third Party Tools (S3T) organization. The S3T team has bold ambitions to re-imagine security products that serve Amazon's pace of innovation at our global scale. This role will focus on leveraging large language models and agentic AI to transform third-party security risk management, automate complex vendor assessments, streamline controllership processes, and dramatically reduce assessment cycle times. You will drive builder efficiency and deliver bar-raising security engagements across Amazon. Key job responsibilities Lead the research, design, and development of GenAI-powered solutions to enhance the security and governance of third-party tools across Amazon Develop and fine-tune large language models (LLMs) and other ML models tailored to security use cases, including risk detection, anomaly identification, and automated compliance Collaborate with cross-functional teams — including Security Engineers, Software Development Engineers, and Product Managers — to translate scientific innovations into scalable, production-ready systems Define and drive the GenAI roadmap for the S3T organization, influencing strategy and prioritization Conduct rigorous experimentation, evaluate model performance, and iterate rapidly to deliver measurable impact Stay current with the latest advancements in GenAI and applied ML research, and bring relevant innovations into Amazon's security ecosystem Mentor junior scientists and contribute to a culture of scientific excellence within the team About the team Security is central to maintaining customer trust and delivering delightful customer experiences. At Amazon, our Security organization is designed to drive bar-raising security engagements. Our vision is that Builders raise the Amazon security bar when they use our recommended tools and processes, with no overhead to their business. Diverse Experiences Amazon Security values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why Amazon Security? At Amazon, security is central to maintaining customer trust and delivering delightful customer experiences. Our organization is responsible for creating and maintaining a high bar for security across all of Amazon’s products and services. We offer talented security professionals the chance to accelerate their careers with opportunities to build experience in a wide variety of areas including cloud, devices, retail, entertainment, healthcare, operations, and physical stores. Inclusive Team Culture In Amazon Security, it’s in our nature to learn and be curious. Ongoing DEI events and learning experiences inspire us to continue learning and to embrace our uniqueness. Addressing the toughest security challenges requires that we seek out and celebrate a diversity of ideas, perspectives, and voices. Training & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, training, and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve.
US, MA, N.reading
Amazon Industrial Robotics is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon Industrial Robotics we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at an unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. Key job responsibilities Design and deploy end-to-end teleoperation pipelines integrating VR/AR headsets and haptics interfaces with robotic hardware Implement force-feedback and tactile sensing algorithms to provide operators with a "sense of touch," improving performance in contact-rich manipulation tasks Collaborate with ML teams to ensure teleoperation interfaces capture high-fidelity state-action pairs, including proprioception, visual, and force/torque data for model training Develop custom networking and streaming protocols to minimize operator-to-robot latency. Conduct user studies to evaluate ergonomics, cognitive load, and "telepresence" effectiveness to iterate on UI/UX designs.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the next-level. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Key job responsibilities * Develop, deploy, and operate scalable bioinformatics analysis workflows on AWS * Evaluate and incorporate novel bioinformatic approaches to solve critical business problems * Originate and lead the development of new data collection workflows with cross-functional partners * Partner with laboratory science teams on design and analysis of experiments About the team Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best.