Customer-obsessed science
Research areas
-
May 15, 20265 min readA new scaling law that relates particular architectural choices to loss helps identify models that improve throughput by up to 47% with no loss of accuracy.
-
May 14, 202616 min read
-
-
April 15, 20268 min read
Featured news
-
UAI 20232023Statistical prediction models are often trained on data that is drawn from different probability distributions than their eventual use cases. One approach to proactively prepare for these shifts harnesses the intuition that causal mechanisms should remain invariant between environments. Here we focus on a challenging setting in which the causal and anticausal variables of the target are unobserved. Leaning
-
ICML 2023 Workshop on Data-centric Machine Learning Research (DMLR)2023Despite recent advances in synthetic data generation, the scientific community still lacks a unified consensus on its usefulness. It is commonly believed that synthetic data can be used for both data exchange and boosting machine learning (ML) training. Privacy-preserving synthetic data generation can accelerate data exchange for downstream tasks, but there is not enough evidence to show how or why synthetic
-
Applied Marketing Analytics (AMA)2023Video ads are increasingly popular in digital marketing, but advertisers are unsure about how much 8 they improve performance over static ads and which consumer response, such as unmuting or 9 watching through the end, matters most. Using data from the online retail site Amazon.com, we 10 apply causal inference methods to both a monthlong and yearlong time horizon and find support 11 for our hypotheses.
-
ACL Findings 2023, NeurIPS 2022 Workshop on SyntheticData4ML2023There has been increasing interest in synthesizing data to improve downstream text-to-SQL tasks. In this paper, we examined the existing synthesized datasets and discovered that state-of-the-art text-to-SQL algorithms did not further improve on popular benchmarks when trained with augmented synthetic data. We observed two shortcomings: illogical synthetic SQL queries from independent column sampling and
-
ACL 20232023Recent works show the effectiveness of cache-based neural coreference resolution models on long documents. These models incrementally process a long document from left to right and extract relations between mentions and entities in a cache, resulting in much lower memory and computation cost compared to computing all mentions in parallel. However, they do not handle cache misses when high-quality entities
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all