Search - Amazon Science

RAMEN

2019

Pre-trained models have demonstrated their effectiveness in many downstream natural language processing (NLP) tasks. The availability of multilingual pre-trained models enables zero-shot transfer of NLP tasks from high resource languages to low resource ones. However, recent research in improving pre-trained models focuses heavily on English. While it is possible to train the latest neural architectures

Conversational AI

FEVER 2.0 (2019)

James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos, Arpit Mittal

2019

We present the results of the second Fact Extraction and VERification (FEVER2.0) Shared Task. The task challenged participants to both build systems to verify factoid claims using evidence retrieved from Wikipedia and to generate adversarial attacks against other participant’s systems. The shared task had three phases: building, breaking and fixing. There were 8 systems in the builder’s round, three of

Conversational AI

Using adversarial training to recognize speakers’ emotions

Viktor Rozgic

May 21, 2019

Adversarial_autoencoder.jpg._CB462489265_.jpg

A person’s tone of voice can tell you a lot about how they’re feeling. Not surprisingly, emotion recognition is an increasingly popular conversational-AI research topic.

Conversational AI

Should Alexa read “2/3” as “two-thirds” or “February Third”?: The science of text normalization

Ming Sun

May 16, 2019

Text normalization is an important process in conversational AI. If an Alexa customer says, “book me a table at 5:00 p.m.”, the automatic speech recognizer will transcribe the time as “five p m”. Before a skill can handle this request, “five p m” will need to be converted to “5:00PM”. Once Alexa has processed the request, it needs to synthesize the response — say, “Is 6:30 p.m. okay?” Here, 6:30PM will be converted to “six thirty p m” for the text-to-speech synthesizer. We call the process of converting “5:00PM” to “five p m” text normalization and its counterpart — converting “five p m” to “5:00PM” — inverse text normalization.

Conversational AI

Training a Machine Learning Model in English Improves Its Performance in Japanese

Judith Gaspers

May 13, 2019

Recently, we published a paper showing that training a neural network to do language processing in English, then retraining it in German, drastically reduces the amount of German-language training data required to achieve a given level of performance.

Conversational AI

How we add new skills to Alexa’s name-free skill selector

Young-Bum Kim

May 3, 2019

Cosine_normalization.jpg._CB465442559_.jpg

Using cosine similarity rather than dot product to compare vectors helps prevent "catastrophic forgetting".

Conversational AI

“Alexa, Turn Down the Lights and Play Music”: The Science of Handling Compound Requests

Rahul Goel

May 2, 2019

Semantic_parse_tree_updated.png._CB465064517_.png

Traditionally, Alexa has interpreted customer requests according to their intents and slots. If you say, “Alexa, play ‘What’s Going On?’ by Marvin Gaye,” the intent should be PlayMusic, and “‘What’s Going On?’” and “Marvin Gaye” should fill the slots SongName and ArtistName.

Conversational AI

Training Speech Synthesizers on Data from Multiple Speakers

Jakub Lachowicz

April 25, 2019

When a customer asks Alexa to play “Hey Jude”, and Alexa responds, “Playing 'Hey Jude' by the Beatles,” that response is generated by a text-to-speech (TTS) system, which converts textual inputs into synthetic-speech outputs...

Conversational AI

Using wake word acoustics to filter out background speech improves speech recognition by 15%

Xing Fan

April 22, 2019

Seq2Seq_encoder-decoder_with_attention.png._CB466581826_.png

One of the ways that we’re always trying to improve Alexa’s performance is by teaching her to ignore speech that isn’t intended for her. At this year’s International Conference on Acoustics, Speech, and Signal Processing, my colleagues and I will present a new technique for doing this, which could complement the techniques that Alexa already uses.

Conversational AI

Two new papers discuss how Alexa recognizes sounds

Ming Sun

April 18, 2019

Last year, Amazon announced the beta release of Alexa Guard, a new service that lets customers who are leaving the house instruct their Echo devices to listen for glass breaking or smoke and carbon dioxide alarms going off. At this year’s International Conference on Acoustics, Speech, and Signal Processing, our team is presenting several papers on sound detection. I wrote about one of them a few weeks ago, a new method for doing machine learning with unbalanced data sets.

Conversational AI

Signal processor improves Echo’s bass response, loudness, and speech recognition accuracy

Jun Yang

April 11, 2019

scrollingwaveformsV2.gif._CB467417779_.gif

Multiband dynamics processing, which separately modifies volume in different frequency bands of an audio signal, is known to improve listeners’ audio experiences. But in the context of voice-controlled systems like the Amazon Echo family of products, it can also improve automatic speech recognition by making echo cancellation easier.

Conversational AI

Cross-lingual transfer learning for bootstrapping AI systems reduces new-language data requirements

Quynh Ngoc Thi Do, Judith Gaspers

April 8, 2019

Cross-lingual_transfer-learning_architecture.jpg._CB466995745_.jpg

Transfer learning is the technique of adapting a machine learning model trained on abundant data to a new context in which training data is sparse. On the Alexa team, we’ve explored transfer learning as a way to bootstrap new functions and to add new classification categories to existing machine learning systems.

Conversational AI

New speech recognition experiments demonstrate how machine learning can scale

Sree Hari Krishnan Parthasarathi

April 4, 2019

LSTMnetworkanimationV3.gif._CB467045280_.gif

Customer interactions with Alexa are constantly growing more complex, and on the Alexa science team, we strive to stay ahead of the curve by continuously improving Alexa’s speech recognition system. Increasingly, keeping pace with Alexa’s expanding capabilities will require automating the learning process, through techniques such as semi-supervised learning, which leverages a small amount of annotated data to extract information from a much larger store of unannotated data.

Machine learning

Joint training on speech signal isolation and speech recognition improves performance

Kenichi Kumatani

April 1, 2019

Architecture_comparison.png._CB468461802_.png

The idea of using arrays of microphones to improve automatic speech recognition (ASR) is decades old. The acoustic signal generated by a sound source reaches multiple microphones with different time delays. This information can be used to create virtual directivity, emphasizing a sound arriving from a direction of interest and diminishing signals coming from other directions. In voice recognition, one of the more popular methods for doing this is known as “beamforming”.

Conversational AI

Audio watermarking algorithm is first to solve "second-screen problem" in real time

Yuan-Yen Tai

March 28, 2019

Audio watermarking is the process of adding a distinctive sound pattern — undetectable to the human ear — to an audio signal to make it identifiable to a computer. It’s one of the ways that video sites recognize copyrighted recordings that have been posted illegally. To identify a watermark, a computer usually converts a digital file into an audio signal, which it processes internally.

Conversational AI

Adversarial training produces synthetic data for machine learning

Rahul Gupta

March 21, 2019

Sentiment analysis is the attempt, computationally, to determine from someone’s words how he or she feels about something. It has a host of applications, in market research, media analysis, customer service, and product recommendation, among other things. Sentiment classifiers are typically machine learning systems, and any given application of sentiment analysis may suffer from a lack of annotated data for training purposes.

Conversational AI

Machine-labeled data + artificial noise = better speech recognition

Minhua Wu

March 20, 2019

Although deep neural networks have enabled accurate large-vocabulary speech recognition, training them requires thousands of hours of transcribed data, which is time-consuming and expensive to collect. So Amazon scientists have been investigating techniques that will let Alexa learn with minimal human involvement, techniques that fall in the categories of unsupervised and semi-supervised learning.

Conversational AI

To correct imbalances in training data, don’t oversample: Cluster

Ming Sun

March 11, 2019

Model depicting the stages in the training of an embedding network

In experiments involving sound recognition, technique reduces error rate by 15% to 30%.

Machine learning

Innovations from the 2018 Alexa Prize

Behnam Hedayatnia

March 5, 2019

The 2018 Alexa Prize featured eight student teams from four countries, each of which adopted distinctive approaches to some of the central technical questions in conversational AI. We survey those approaches in a paper we released late last year, and the teams themselves go into even greater detail in the papers they submitted to the latest Alexa Prize Proceedings. Here, we touch on just a few of the teams’ innovations.

Conversational AI

AI tools let Alexa Prize participants focus on science

Anushree Venkatesh

February 27, 2019

CoBot_architecture.jpg._CB453492855_.jpg

To ensure that Alexa Prize contestants can concentrate on dialogue systems — the core technology of socialbots — Amazon scientists and engineers built a set of machine learning modules that handle fundamental conversational tasks and a development environment that lets contestants easily mix and match existing modules with those of their own design.

Conversational AI

Search results

Work with us