-
2025Crafting effective features is a crucial yet labor-intensive and domain-specific task within machine learning pipelines. Fortunately, recent advancements in Large Language Models (LLMs) have shown promise in automating various data science tasks, including feature engineering. But despite this potential, evaluations thus far are primarily based on the end performance of a complete ML pipeline, providing
-
2025Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the preferences used in DAAs are usually collected before the alignment training begins and remain unchanged (off-policy). This design leads to two problems where the policy model
-
NAACL 2025 Workshop on TrustNLP2025Warning: This paper includes content that may be considered inappropriate or offensive to some readers. Viewer discretion is advised. Language Model Models (LLMs) have improved dramatically in the past few years, increasing their adoption and the scope of their capabilities over time. A significant amount of work is dedicated to “model alignment”, i.e., preventing LLMs to generate unsafe responses when
-
2025Large Language Models (LLMs) are increasingly used as chatbots, yet their ability to personalize responses to user preferences remains limited. We introduce PREFEVAL, a benchmark for evaluating LLMs’ ability to infer, memorize and adhere to user preferences in a long-context conversational setting. PREFEVAL comprises 3,000 manually curated user preference and query pairs spanning 20 topics. PREFEVAL contains
-
2025Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, they are prone to contextual hallucination, generating information that is either unsubstantiated or contradictory to the given context. Although many studies have investigated contextual hallucinations in LLMs, addressing them in long-context inputs remains an open problem. In this work, we take an initial
Related content
-
August 16, 2021Teams' research papers that outline their approaches to development and deployment are now available.
-
August 16, 2021Team Alquist awarded $500,000 prize for top score in finals competition; teams from Stanford University and the University of Buffalo place second and third.
-
August 12, 2021New metric can be calculated 55 times as quickly as its state-of-the-art predecessor, making it practical for model training.
-
August 11, 2021Holleman, the chief scientist of Alexa Fund company Syntiant, explains why the company’s new architecture allows machine learning to be deployed practically anywhere.
-
August 05, 2021New track of the 10th Dialog System Technology Challenge (DSTC10) will target noisy speech environments.
-
August 04, 2021New approach corrects for cases when average improvements are accompanied by specific regressions.