Customer-obsessed science
Research areas
-
November 20, 20254 min readA new evaluation pipeline called FiSCo uncovers hidden biases and offers an assessment framework that evolves alongside language models.
-
-
-
September 2, 20253 min read
-
Featured news
-
2024In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity — it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and factually
-
ACM FAccT 20242024We present a broad characterization of gender representation in a large heterogeneous sample of retail products. In particular, we study online product textual information, such as titles and descriptions. Our goal is to understand from a semantic perspective, differences and similarities in how girls (women) and boys (men) are represented. We perform a comparative analysis of the language used in gendered
-
2024Sequential recommendation systems suggest products based on users’ historical behaviours. The inherent sparsity of user-item interactions in a vast product space often leads to unreliable recommendations. Recent research addresses this challenge by leveraging auxiliary product relations to mitigate recommendation uncertainty, and quantifying uncertainty in recommendation scores to modify the candidates
-
NAACL 2024 Workshop on TrustNLP2024Language models, pre-trained on large amounts of unmoderated content, have been shown to contain societal biases. Mitigating such biases typically requires access to model parameters and training schemas. In this work, we address bias mitigation at inference time, such that it can be applied to any black-box model. To this end, we propose a belief generation and aug-mentation framework, BELIEVE, that demonstrates
-
ICDAR 20242024Companies and organizations grapple with the daily burden of document processing. As manual handling is tedious and error-prune, automating this process is a significant goal. In response to this demand, research on table extraction and information extraction from scanned documents in gaining increasing traction. These extractions are fulfilled by machine learning models that require large-scale and realistic
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all