Customer-obsessed science


Research areas
-
June 25, 2025With large datasets, directly generating data ID codes from query embeddings is much more efficient than performing pairwise comparisons between queries and candidate responses.
Featured news
-
QIP 20252025Determining the quantum capacity of a noisy quantum channel is an important problem in the field of quantum communication theory. In this work, we consider the Gaussian random displacement channel Nσ, a type of bosonic Gaussian channels relevant in various bosonic quantum information processing systems. In particular, we attempt to make progress on the problem of determining the quantum capacity of a Gaussian
-
2025In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets. However, the large size and high computation demands of LLMs limit their practicality in many applications, especially when further fine-tuning is required. To address these limitations, smaller models are typically preferred for deployment. However, their training is
-
ACM SIGOPS 2025 Workshop on Hot Topics in Operating Systems2025A metastable failure is a self-sustaining congestive collapse in which a system degrades in response to a transient stressor (e.g., a load surge) but fails to recover after the stressor is removed. These rare but potentially catastrophic events are notoriously hard to diagnose and mitigate, sometimes causing prolonged outages affecting millions of users. Ideally, we would discover susceptibility to metastable
-
2025Recent advancements in speech encoders have drawn attention due to their integration with Large Language Models for various speech tasks. While most research has focused on either causal or full-context speech encoders, there’s limited exploration to effectively handle both streaming and non-streaming applications, while achieving state-of-the-art performance. We introduce DuRep, a Dual-mode Speech Representation
-
2025The use of human speech to train LLMs poses privacy concerns due to these models’ ability to generate samples that closely resemble artifacts in the training data. We propose a speaker privacy-preserving representation learning method through the Universal Speech Codec (USC), a computationally efficient codec that disentangles speech into: (i) privacy-preserving semantically rich representations, capturing
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all