-
Large Language Models (LLMs) have brought with them an unprecedented interest in AI in society. This has enabled their use in several day to day applications such as virtual assistants or smart home agents. This integration with external tools also brings several risk areas where malicious actors may try to inject harmful instruc-tions in the user query (direct prompt injection) or in the retrieved information
-
2024We propose a novel framework for pretraining and fine-tuning language models with the goal of determining whether two addresses represent the same physical building. Address matching and building authoritative address catalogues are important to many applications and businesses, such as delivery services, online retail, emergency services, logistics, etc. We propose to view a collection of addresses as
-
Journal of the American Medical Informatics Association2024Objectives: Patients are increasingly being given direct access to their medical records. However, radiology reports are written for clinicians and typically contain medical jargon, which can be confusing. One solution is for radiologists to provide a “colloquial” version that is accessible to the layperson. Because manually generating these colloquial translations would represent a significant burden for
-
2024In e-commerce, high consideration search missions typically require careful and elaborate decision making, and involve a substantial research investment from customers. We consider the task of automatically identifying such High Consideration (HC) queries. Detecting such missions or searches enables e-commerce sites to better serve user needs through targeted experiences such as curated QA widgets that
-
2024Various types of learning rate (LR) schedulers are being used for training or fine tuning of Large Language Models today. In practice, several mid-flight changes are required in the LR schedule either manually, or with careful choices around warmup steps, peak LR, type of decay and restarts. To study this further, we consider the effect of switching the learning rate at a predetermined time during training
Related content
-
September 10, 2021Data augmentation makes examples more realistic, while continual-learning techniques prevent “catastrophic forgetting”.
-
September 09, 2021Model using ASR hypotheses as extra inputs reduces word error rate of human transcriptions by almost 11%.
-
September 02, 2021Branching encoder networks make operation more efficient, while “neural diffing” reduces bandwidth requirements for model updates.
-
August 27, 2021Liu discusses her work in speech recognition and understanding, prosody modeling, summarization, and natural language processing.
-
August 27, 2021New voice for Alexa’s Reading Sidekick feature avoids the instabilities common to models with variable prosody.
-
August 25, 2021Katrin Kirchhoff, director of speech processing for Amazon Web Services, on the many scientific challenges her teams are tackling.