Identifying sounds in audio streams

On September 20, Amazon unveiled a host of new products and features, including Alexa Guard, a smart-home feature available on select Echo devices later this year. When activated, Alexa Guard can send a customer alerts if it detects the sound of glass breaking or of smoke or carbon monoxide alarms in the home.

Alexa Guard

At this year’s Interspeech conference, in September, our team presented a pair of papers that describe two approaches we’ve taken to the problem of sound identification in our research. Both approaches use neural networks, but one of the networks — which we call R-CRNN — is larger and takes longer to train than the other.

We believe, however, that in the long run, it also promises greater accuracy. So the two systems might be used in conjunction: the smaller network would run locally on a sound detection device, uploading audio samples to the larger, cloud-based network only if they’re likely to indicate threats to home security.

We tested both systems using data that had been provided to contestants in the third annual IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2017). Both systems achieved higher scores than the third-place finisher in the competition, and we believe that they have advantages over the top two finishers, as well.

Systems entered in the competition had to analyze 30-second snippets of audio and determine, first, whether they contained particular sounds — such as glass breaking — and, second, where in the snippets the sounds occurred.

Most systems divided the snippets into 46-millisecond units — or “frames” — and then tried to assess whether individual frames contained the acoustic signatures of the target sounds.

When that task was complete, the systems had to stitch the frames back together into composite sounds. To do that, the top-performing systems used hand-coded rules that took advantage of the specific audio properties of the sounds in the contest dataset. Our systems don’t require this additional, content-specific step, so we believe they will generalize better to new sounds and settings.

Moreover, R-CRNN adapts a machine-learning mechanism that has already delivered state-of-the-art performance on several computer vision tasks. Computer vision research benefits from huge sets of labeled training data accumulated over decades; the data available to train our sound recognition systems was relatively meager. With more training data, R-CRNN’s performance should improve significantly.

The mechanism we adapt is called a region proposal network, which was developed to rapidly identify two-dimensional regions of images likely to contain objects sought by an object classifier. We instead use it to identify one-dimensional regions of an audio stream likely to contain sounds of interest.

This means that our classifier can act on the entire sound at once, rather than splitting it into frames that later must be pieced together again.

The region proposal network was designed to work with an object detector known as R-CNN, for region-based convolutional neural net. With R-CNN, images are fed to a convolutional neural net that learns to extract features useful for object detection, which then pass to the region proposal network.

In our network, R-CRNN, the extra “R” stands for “recurrent,” because our feature-extraction network is both convolutional and recurrent, meaning that it can factor in the order in which data arrive. That’s not necessary for image classification, but it usually improves audio processing, since it allows the network to learn systematic correlations between successive sounds.

Our feature extraction network is also a residual network, meaning that each of its layers receives not only the output of the layer beneath it but the input to the layer beneath that, too. That way, during training, each layer learns to elaborate on the computations performed by the preceding layers, rather than — at least occasionally — undoing them.

The feature summary vector produced by the extraction network passes to the region proposal network, and then both the summary vector and the region proposals pass to another network, which makes the final classification. In ongoing experiments, we’re evaluating whether this final classifier is necessary. If the region proposal network can draw reliable inferences itself, that will make the R-CRNN model both more compact and easier to train.

Like many of the contestants in the DCASE challenge, our other system splits input signals into 46-millisecond frames. And like them, it passes the frames through a network that learns to extract features of the signal useful for sound identification.

But our system also features an “attention mechanism,” a second network whose output is an array of values, one for each frame, in chronological order. Frames that appear to have characteristics of the target sound receive a high score in the output vector; frames that don’t receive a low score.

This array essentially demarcates the part of the input signal that contains the sound of interest, again dispensing with the need to stitch frames back together after the fact. Both the array and the feature vector pass to a classifier that makes the final assessment.

This simple architecture significantly reduces the model’s memory footprint and computational overhead, relative not only to R-CRNN but to the top two finishers in the DCASE challenge, too. (Those systems used “ensemble methods”, meaning they comprised multiple, separately trained models, which process data independently before having their results pooled.) It thus holds unusual promise for on-device use.

The model also has one other architectural feature that makes it more accurate. As the input signal passes through the feature extraction network, its time resolution is halved several times: we keep reducing the number of network nodes required to represent the signal. This ensures that the network’s output — the feature vector — will include information relevant to the final classification regardless of the sound’s duration. The feature vector for a half-second’s worth of breaking glass, for instance, will look roughly the same as the feature vector for three seconds.

In fact, this network fared slightly better than R-CRNN on the DCASE challenge test set. On the task of identifying whether a given 30-second input contained a sound of interest, it had an error rate of 20% and an F1 score (which measures both false positives and false negatives) of 90%; R-CRNN’s scores were 23% and 88%, respectively. (For comparison, the winner had scores of 13% and 93%, and the third-place finisher had scores of 28% and 85%.) But again, we believe that R-CRNN’s performance suffered more from lack of training data than the other models’. We consider it the most direct way to approach the problem of sound recognition.

Acknowledgments: Ming Sun, Chao Wang

Related content

US, WA, Redmond
Have you ever wanted to be part of a team that is building industry changing technology? Amazon’s Project Kuiper is an initiative to launch a constellation of Low Earth Orbit satellites that will provide low-latency, high-speed broadband network connectivity to unserved and underserved communities around the world. The Kuiper Business Solutions team owns a suite of products and services to operate and scale Kuiper. We are looking for a passionate, talented, and inventive Data Scientist with a background in AI, Gen AI, Machine Learning, NLP, to lead delivering best in class automated customer service and business analytic solutions for Kuiper Customer Service. As a Data Scientist, you will be responsible for the development, fine-tuning, and evaluation of AI models that power our chatbot and IVR solutions. Your work will ensure the chatbot and IVR is accurate, reliable, and continually improving to meet customer needs. This role involves collaborating with cross-functional teams to integrate AI solutions into our customer service platform, monitor their performance, and implement ongoing enhancements. The ideal candidate has experience in successfully building chat bots using AI technologies, measuring their performance and delivering ongoing improvements. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. Key job responsibilities * Build and validate data pipelines for training and evaluating the LLMs * Extensively clean and explore the datasets * Train and evaluate LLMs in a robust manner * Design and conduct A/B tests to validate model performance * Automate model inference on AWS infrastructure
US, WA, Seattle
AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion. Come work for M13 - an AWS team specializing in the deception and disruption of cyber threats. We are looking for an Applied Scientist who is passionate about the security domain. You will build services and tools for security engineers and developers that leverage artificial intelligence and machine learning to pull unique insights about the cyber threat landscape. You will be part of a team building Large Language Model (LLM)-based services with the focus on enabling AWS teams to interact with our threat data. The team works in close collaboration with other AWS security services to power mitigations that protect the global AWS network and features in external security services such as Amazon GuardDuty, AWS WAF, and AWS Network Firewall. If you are excited about combating the ever evolving threat landscape then we'd love to talk to you. As an Applied Scientist, you are recognized for your expertise, advise team members on a range of machine learning topics, and work closely with software engineers to drive the delivery of end-to-end modeling solutions. Your work focuses on ambiguous problem areas where the business problem or opportunity may not yet be defined. The problems that you take on require scientific breakthroughs. You take a long-term view of the business objectives, product roadmaps, technologies, and how they should evolve. You drive mindful discussions with customers, engineers, and scientist peers. You bring perspective and provide context for current technology choices, and make recommendations on the right modeling and component design approach to achieve the desired customer experience and business outcome. Key job responsibilities • Understand the challenges that security engineers and developers face when building software today, and develop generalizable solutions. • Collaborate with the team to pave the way towards bringing your solution into production systems. Lead cross team projects and ensure technical blockers are resolved • Communicate and document your research via publishing papers in external scientific venues. About the team *Why AWS* Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. *Diverse Experiences* Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. *Work/Life Balance* We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. *Inclusive Team Culture* Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. *Mentorship and Career Growth* We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
US, CA, Sunnyvale
The Edge CV team under Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Applied Scientist with a strong deep learning background, to help build industry-leading technology with computer vision and multimodal perception models for various edge applications. Key job responsibilities As an Applied Scientist with the Edge CV team under AGI, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art with multimodal models with an emphasis on computer vision. Your work will directly impact our customers in the form of products and services that make use of CV technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in AGI in within perception domain. A day in the life An Applied Scientist with the AGI team will support the science solution design, run experiments, research new algorithms, and find new ways of optimizing the customer experience; while setting examples for the team on good science practice and standards. Besides theoretical analysis and innovation, an Applied Scientist will also work closely with talented engineers and scientists to put algorithms and models into practice. About the team The Edge CV team has a mission to deliver best in class CV and multimodal models in support of various low latency perception based applications for devices like Echo Show series within Amazon.
US, WA, Seattle
We are seeking a senior scientist with demonstrated experience in A/B testing along with related experience with observational causal modeling (e.g. synthetic controls, causal matrix completion). Our team owns "causal inference as a service" for the Pricing and Promotions organization; we run A/B tests on new pricing, promotions, and pricing/promotions CX algorithms and, where experimentation is impractical, conduct observational causal studies. Key job responsibilities We are seeking a senior scientist to help envision, design, and build the next generation of pricing, promotions, and pricing/promotions CX for Amazon. On our team, you will work at the intersection of economic theory, statistical inference, and machine learning to design and implement in production new statistical methods for measuring causal effects of an extensive array of business policies. This position is perfect for someone who has a deep and broad analytic background, is passionate about using mathematical modeling and statistical analysis to make a real difference. You should be familiar with modern tools for data science and business analysis and have experience coding with engineers to put projects into production. We are particularly interested in candidates with research background in experimental statistics. A day in the life - Discuss with business problems with business partners, product managers, and tech leaders - Brainstorm with other scientists to design the right model for the problem at hand - Present the results and new ideas for existing or forward looking problems to leadership - Dive deep into the data - Build working prototypes of models - Work with engineers to implement prototypes in production - Analyze the results and review with partners About the team We are a team of scientists who design and implement the econometrics powering pricing, promotions, and pricing/promotions CX.
US, WA, Seattle
Do you want to join a team of innovative scientists to research and develop generative AI technology that would disrupt the industry? Do you enjoy dealing with ambiguity and working on hard problems in a fast-paced environment? Amazon Connect is a highly disruptive cloud-based contact center from AWS that enables businesses to deliver intelligent, engaging, dynamic, and personalized customer service experiences. As an Applied Scientist on our team, you will work closely with senior technical and business leaders from within the team and across AWS. You distill insight from huge data sets, conduct cutting edge research, foster ML models from conception to deployment. You have deep expertise in machine learning and deep learning broadly, and extensive domain knowledge in natural language processing, generative AI and LLMs, etc. The ideal candidate has the ability to understand, implement, innovate and on the state-of-the-art generative AI based systems. You are comfortable with quickly prototyping and iterating your ideas to build robust ML models using technology such as PyTorch, Tensorflow, AWS Sagemaker, and SparkML. Our team is at an early stage, so you will have significant impact on our ML deliverables with little operational load from existing models/systems. We have a rapidly growing customer base and an exciting charter in front of us that includes solving highly complex engineering and scientific problems. We are looking for passionate, talented, and experienced people to join us to innovate on modern contact centers in the cloud. The position represents a rare opportunity to be a part of a fast-growing business soon after launch, and help shape the technology and product as we grow. You will be playing a crucial role in developing the next generation contact center, and get the opportunity to design and deliver scalable, resilient systems while maintaining a constant customer focus. Our team is leading ML and optimization features in Amazon Connect. We are a team of scientists and engineers working on multiple science projects for Amazon Connect. We use state-of-the-art science and engineering practices to address the hard problems in contact center operation and management for our customers, and we move fast to implement solutions and refine them based on customer feedback. Learn more about Amazon Connect here: About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.
US, MA, Cambridge
Amazon Lab126 is an inventive research and development company that designs and engineers high-profile consumer electronics. Lab126 began in 2004 as a subsidiary of, Inc., originally creating the best-selling Kindle family of products. Since then, we have produced groundbreaking devices like Fire tablets, Fire TV and Amazon Echo. What will you help us create? The Role: We are looking for a high caliber Applied Scientist to join our team. As part of the larger technology team working on new consumer technology, your work will have a large impact to hardware, internal software developers, ecosystem, and ultimately the lives of Amazon customers. In this role, you will: - Propose new research projects, get buy-in from stakeholders, plan and budget the project and lead the team for successful execution - Work closely with an inter-disciplinary product development team including outside partners to bring the prototype algorithm into commercialization - Take a big part in the mission to create earth's best employer - Be a respectable team leader in an open and collaborative environment
US, CA, San Diego
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by preventing eCommerce fraud? Are you excited by the prospect of analyzing and modeling terabytes of data and creating state-of-the-art algorithms to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you enjoy collaborating in a diverse team environment? If yes, then you may be a great fit to join the Amazon Buyer Risk Prevention (BRP) Machine Learning group. We are looking for a talented scientist who is passionate to build advanced algorithmic systems that help manage safety of millions of transactions every day. Key job responsibilities Use machine learning and statistical techniques to create scalable risk management systems Learning and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations, Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches
GB, Cambridge
The Artificial General Intelligence team (AGI) has an exciting position for an Applied Scientist with a strong background NLP and Large Language Models to help us develop state-of-the-art conversational systems. As part of this team, you will collaborate with talented scientists and software engineers to enable conversational assistants capabilities to support the use of external tools and sources of information, and develop novel reasoning capabilities to revolutionise the user experience for millions of Alexa customers. Key job responsibilities As an Applied Scientist, you will develop innovative solutions to complex problems to extend the functionalities of conversational assistants . You will use your technical expertise to research and implement novel algorithms and modelling solutions in collaboration with other scientists and engineers. You will analyse customer behaviours and define metrics to enable the identification of actionable insights and measure improvements in customer experience. You will communicate results and insights to both technical and non-technical audiences through written reports, presentations and external publications.
US, WA, Seattle
Amazon’s Global Media and Entertainment (GME) organization is creating a future of entertainment where creative content, innovation, and commerce come together. We leverage Amazon’s unique expertise across video, music, gaming, and more to create a truly immersive entertainment experience. Our team, GME Science, is focused on building science tools to optimize Amazon’s entertainment offerings, so that we can provide a great customer experience while operating as a sustainable and profitable business. We push ourselves to Think Big, building ambitious models that create value in multiple GME businesses. This role will expand our team’s measurement work. Business leaders need to quickly understand the long-term impact of various investments, such as new website features, content creation, or marketing campaigns. Our team figures out how to take short-term signals – such as clicks or signups – and turn them into estimates of long-term financial impacts. We work with measurement teams in each business as well as central teams to build foundational measurement science and adapt it for unique use cases. One particular application for this role is to build a principled approach to valuing content/talent deals that include multiple GME businesses. Each deal is unique, featuring talent from film, sports, music, and other top industries, with contract terms that could include video content, podcasts, live appearances, and more. Our valuations need to be structured so that they are comparable across deals, yet flexible enough to account for diverse contracts. To be successful in this role, you will need effective communication, an ability to work closely with stakeholders across our many GME partner teams, and the skill to translate data-driven findings into actionable insights. This includes developing a deep understanding of our business context, which is ambiguous and can change quickly. Your work will be used by decision-makers across GME to deliver the best entertainment experience for our customers, which means we have a high bar. Our healthy team culture is supportive and fast-paced, and we prioritize learning, growth, and helping each other to continuously raise the bar. Impact and Career Growth In today’s entertainment landscape, critical decisions are made with data and economic models. You’ll help GME leaders ask the right questions, and then deliver data-driven answers, creating the future of GME at Amazon. You’ll help define a long-term science vision in this space and translate it into an actionable roadmap. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding – a perfect recipe for career growth as an economist in tech. Key job responsibilities • Design and build econometric models, especially causal models, to measure the value of the business and its many features • Develop science products from concept to prototype to production, incorporating feedback from scientists and business partners • Independently identify and pursue new opportunities to leverage economic insights across GME businesses • Write business and technical documents communicating business context, methods, and results to business leadership and other scientists • Serve as a technical reviewer for our team and related teams, including document and code reviews
US, WA, Seattle
Amazons Price Optimization science team is seeking a Senior Scientist to harness planet scale multi-modal datasets, navigate a continuously evolving competitor landscape, in order to regularly generate fresh customer-relevant prices on billions of Amazon and Third Party Seller products worldwide. This is a high visibility, high impact role for a seasoned, intellectually curious scientist able to partition customer problems into discrete solvable components, build or leverage existing approaches to deliver those components, and innovate to deploy the science into measurable customer-improving outputs. This role requires an individual with exceptional machine learning and reinforcement learning modeling expertise, a strong statistical background, excellent cross-functional collaboration skills, outstanding business acumen, and an entrepreneurial spirit. We are looking for an experienced innovator, who is a self-starter, comfortable with ambiguity, demonstrates strong attention to detail, and has the ability to work in a fast-paced and ever-changing environment. Price is a highly relevant input into many partner-team architectures, and is highly relevant to the customer, therefore this role creates the opportunity to drive extremely large impact (measured in Bs not Ms), but demands careful thought and clear communication. Key job responsibilities We are hiring a senior applied scientist to drive our pricing optimization initiatives. The Price Optimization science team drives cross-domain and cross-system improvements through: * Using cross-ASIN signals to optimally price bundles, ensure price rationality across products, and discovering and launch optimal promotional bundles * invent and deliver price optimization, simulation, and competitiveness tools for 3p Sellers. * shape and extend our bandit optimization platform - a pricing centric multi-armed bandit platform that automates the optimization of various system parameters and price inputs. * Promotion optimization initiatives exploring CX, discount amount, and cross-product optimization opportunities. * Identifying opportunities to optimally price across systems and contexts (marketplaces, request types, event periods) About the team The Pricing Optimization science team owns price quality, discovery and discount optimization initiatives across Amazon’s internal pricing architecture as well as upwards into the customer discovery funnel. We leverage planet scale data on billions of Amazon and external competitor products to build advanced optimization models for pricing, elasticity estimation, product substitutability, and optimization. We preserve long term customer trust by ensuring Amazon's prices are always competitive and error free.