-
ECCV 20202020The ubiquity of smartphone cameras has led to more and more documents being captured by cameras rather than scanned. Unlike flatbed scanners, photographed documents are often folded and crumpled, resulting in large local variance in text structure. The problem of document rectification is fundamental to the Optical Character Recognition (OCR) process on documents, and its ability to overcome geometric distortions
-
Interspeech 20202020Voice assistants such as Siri, Alexa, etc. usually adopt a pipeline to process users’ utterances, which generally include transcribing the audio into text, understanding the text, and finally responding back to users. One potential issue is that some utterances could be devoid of any interesting speech, and are thus not worth being processed through the entire pipeline. Examples of uninteresting utterances
-
ACM Transactions on Sensor Networks2020Many modern smart building applications are supported by wireless sensors to sense physical parameters, given the flexibility they offer and the reduced cost of deployment. However, most wireless sensors are powered by batteries today and large deployments are inhibited by the requirement of periodic battery replacement. Energy harvesting sensors provide an attractive alternative, but they need to provide
-
Interspeech 20202020Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants. Such a keyword is often referred to as wake word as it is used to wake up voice assistant enabled devices. Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS. In this paper, we propose two new methods
-
Interspeech 20202020Wakeword detection is responsible for switching on downstream systems in a voice-activated device. To prevent a response when the wakeword is detected by mistake, a secondary network is often utilized to verify the detected wakeword. Published verification approaches are formulated based on Automatic Speech Recognition (ASR) biased towards the wakeword. This approach has several drawbacks, including high
Related content
-
December 12, 2022Vice president of ML and AI Services says more than 100,000 customers are doing machine learning on AWS.
-
December 12, 2022Vice president Bratin Saha reflects on the past and future of Amazon Web Services’ machine learning tools and AI services.
-
December 7, 2022Learn about a real-time continual, lifelong learning system that trains machine learning models using production data at scale.
-
December 5, 2022Internal event designed to replicate external science conferences.
-
December 5, 2022Accounting for data heterogeneity across edge devices enables more useful model updates, both locally and globally.
-
December 2, 2022Learn about the development, operational, and process improvements that can be incorporated by organizations to improve the explainability of models while adhering to regulatory requirements.