Workshops on trustworthy NLP help build community

In 2022, the Alexa Trustworthy AI team helped organize a workshop at NAACL and a special session at Interspeech.

The past year saw an acceleration of the recent trend toward research on fairness and privacy in machine learning. The Alexa Trustworthy AI team was part of that, organizing the Trustworthy Natural Language Processing workshop (TrustNLP 2022) at the meeting of the North American chapter of the Association for Computational Linguistics (NAACL) and a special session at Interspeech 2022 titled Trustworthy Speech Processing. Complementing our own research, our organizational work has the aim of building the community around this important research area.

TrustNLP keynotes

This year was the second iteration of the TrustNLP workshop, with contributed papers, keynote presentations from leading experts, and a panel discussion with a diverse cohort of panelists.

TrustNLP speakers.png
The keynote speakers at TrustNLP 2022 were, from left to right, Diyi Yang of Georgia Tech, Subho Majumdar of Splunk, and Fei Wang of Weill Cornell Medicine.

The morning session kicked off with a keynote address by Subho Majumdar of Splunk, on interpretable graph-based mapping of trustworthy machine learning research, which provides a framework for estimating the fairness risks of machine learning (ML) applications in industry. The Splunk researchers scraped papers from previous ML conferences and used the resulting data to build a word co-occurrence matrix to detect interesting communities in this network.

They found that terms related to trustworthy ML separated out into two well-formed communities, one centered on privacy issues and the other on demography and fairness-related problems. Majumdar also suggested that such information could be leveraged to quantitatively assess fairness-related risks for different research projects.

Diyi Yang of Georgia Tech, our second keynote speaker, gave a talk titled Building Positive and Trustworthy Language Technologies, in which she described prior work on conceptualizing and categorizing various kinds of trust.

Related content
Eliminating the need for annotation makes bias testing much more practical.

In the context of increasing human trust in AI, she talked about some of the published research from her group, ranging from formulating the positive-reframing problem, which aims to neutralize a negative point of view in a sentence and give the author a more positive perspective, to the Moral Integrity Corpus, a large dataset capturing the moral assumptions embedded in roughly 40,000 prompt-reply pairs. Novel benchmarks and tasks like these will prove a useful resource for building trustworthy language technologies.

In our final afternoon keynote session, Fei Wang of Weill Cornell Medicine gave a talk titled Towards Building Trustworthy Machine Learning Models in Medicine: Evaluation vs. Explanation. This keynote provided a comprehensive overview of the evolution of ML techniques as applied to clinical data, ranging from early works on risk prediction and matrix representations of patients using electronic-health-record data to more recent works on sequence representation learning.

Wang cautioned against common pitfalls of using ML methods for applications such as Covid detection, which include risks of bias in public repositories and Frankenstein datasets — hand-massaged datasets to get ideal model performance. He also emphasized the need for more robust explainability methods that can provide insights on model predictions in medicine.

TrustNLP panel

Our most popular session was the panel discussion in the afternoon, with an exciting and eclectic panel from industry and academia. Sara Hooker of Cohere for AI emphasized the need for more-robust tools and frameworks to help practitioners better evaluate various deployment-time design choices, such as compression or distillation. She also discussed the need for more-efficient ways of communicating research that can help policymakers play an active role in shaping developments in the field.

Ethan Perez of Anthropic AI argued the need for red-teaming large language models and how we could use existing language models to identify new types of weaknesses. Pradeep Natarajan of Alexa AI argued the need for communicating risks effectively by drawing on developments from old-school analytic fields such as finance and actuarial analysis.

Log-likelihood differences.png
This figure from "An empirical study on pseudo-log-likelihood bias measures for masked language models using paraphrased sentences" shows, for several different masked language models (MLMs), the log-likelihood differences between pairs of sentences in a standard dataset for evaluating bias. The sentences in each pair are the same, except that in one, references to a disadvantaged group have been replaced by references to an advantaged group. The small log-likelihood differences between sentences suggests that changes of wording elsewhere in the sentences can have a significant effect on the resulting bias measure.

Yulia Tsvetkov from the University of Washington argued that models with good performance on predefined benchmarks still fail to generalize well to real-world applications. Consequently, she argued, there is a need for the community to explore approaches that are adaptive to dynamic data streams. Several panel members also acknowledged the expanding landscape in research, including community research groups producing top-quality research, and there was a healthy discussion around the similarities and differences between research in academia and in industry.

TrustNLP papers

Lastly, we had our wonderful list of paper presentations. The workshop website contains the complete list of accepted papers. The best-paper award went to "An empirical study on pseudo-log-likelihood bias measures for masked language models using paraphrased sentences", by Bum Chul Kwon and Nandana Mihindukulasooriya. The researchers study the effect of word choices/paraphrases in log-likelihood-based bias measures, and they suggest improvements, such as thresholding to determine the presence of significant log-likelihood difference between categories of bias attributes.

All the video presentations and live recordings for TrustNLP-2022 are available on underline.

Interspeech session

The special session at Interspeech was our first, and the papers presented there covered a wide array of topics, such as adversarial attacks, attribute and membership inference attacks, and privacy-enhanced strategies for speech-related applications.

We concluded the session with an engaging panel focused on three crucial topics in trustworthy ML: public awareness, policy development, and enforcement. In the discussion, Björn Hoffmeister of the Alexa Speech group stressed the importance of educating people about the risks of all types of data leakage — not just audio recordings and biometric signals — and suggested that this would create a positive feedback cycle with regulatory bodies, academia, and industry, leading to an overall improvement in customer privacy.

Related content
Open-source library enables optimization of hyperparameters to maximize performance while meeting fairness constraints.

Google’s Andrew Hard highlighted the public’s desire to protect personal data and the risks of accidental or malicious data leakage; he stressed the need for continued efforts from the AI community in this space. On a related note, Bhiksha Raj of Carnegie Mellon University (CMU) suggested that increasing public awareness is a bigger catalyst for adoption of trustworthy-ML practices than external regulations, which may get circumvented.

Isabel Trancoso of the University of Lisbon stressed the pivotal role played by academia in raising general awareness, and she called attention to some of the challenges of constructing objective and unambiguous policies that can be easily interpreted in a diverse set of geographic locations and applications. CMU’s Rita Singh expanded on this point and noted that policies developed by a centralized agency would be inherently incomplete. Instead, she recommended a diverse set of — perhaps geographically zoned — regulatory agencies.

Multiple panelists agreed on the need for a concrete and robust measure for trustworthy ML, which can be reported for ML models along with their utility scores. Finally, Shrikanth Narayanan of the University of Southern California (also one of the session cochairs) provided concluding remarks, closing the session with optimism owing to the strong push from all sectors of the AI research community to increase trustworthiness in ML. The full set of papers included in the session are available on the Interspeech site.

We thank all the speakers, authors, and panelists for a memorable and fun learning experience, and we hope to return next year to discuss more exciting developments in the field.

Research areas

Related content

US, WA, Seattle
The Amazon Devices and Services organization designs, builds and markets Kindle e-readers, Fire Tablets, Fire TV Streaming Media Players and Echo devices. The Device Economics team is looking for an Economist to join our fast paced, start-up environment to help invent the future of product economics. We solve significant business problems in the devices and retail spaces by understanding customer behavior and developing business decision-making frameworks. You will build econometric and machine learning models for causal inference and prediction, using our world class data systems, and apply economic theory to solve business problems in a fast-moving environment. This involves analyzing Amazon Devices and Services customer behavior, and measuring and predicting the lifetime value of existing and future products. We build scalable systems to ensure that our models have broad applicability and large impact. You will work with Scientists, Economists, Product Managers, and Software Developers to provide meaningful feedback about stakeholder problems to inform business solutions and increase the velocity, quality, and scope behind our recommendations. Key job responsibilities Applies expertise in causal modeling to develop econometric/machine learning models to measure the economic value of devices and the business Reviews models and results for other scientists, mentors junior scientists Generates economic insights for the Devices and Services business and work with stakeholders to run the business for effectively Describes strategic importance of vision inside and outside of team. Identifies business opportunities, defines the problem and how to solve it. Engages with scientists, business leadership outside Devices and Services to understand interplay between different business units We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Seattle, WA, USA
US, WA, Seattle
Amazon Advertising's Publisher Technologies team is looking for an experienced Applied Scientist with proven research experience in control theory, online machine learning, and/or mechanism design to drive innovative algorithms for ad-delivery at scale. Your work will directly shape pacing, yield optimization, and ad-selection for Amazon's publishers and impact experiences for hundreds of millions of users and devices. About the team Amazon Advertising operates at the intersection of eCommerce, streaming, and advertising, offering a rich array of digital advertising solutions with the goal of helping our customers find and discover anything they want to buy. We help advertisers reach customers across Amazon's owned and operated sites (publishers) across the web and on millions of devices such as Amazon.com, Prime Video, FreeVee, Kindles, Fire tablets, Fire TV, Alexa, Mobile, Twitch, and more. Within Ads, Publisher Technologies is building the next generation of ad-serving products to allow our publishers to monetize their on-demand, streaming, and static content across Amazon’s ad network in a few clicks. Publishers interact directly with our technology, through programmatic APIs to optimize billions of impression opportunities per day. About the role Publisher Technologies is looking to build out our Publisher Ad Server Science + Simulation and Experimentation team to drive innovation across ad-server delivery algorithms for budget pacing, ad-selection, and yield optimization. We seek to ensure the highest quality experiences for Amazon's customers by matching them with most relevant ads while ensuring optimal yield for publishers. As a Senior Applied Scientist, you will research, invent, and apply cutting edge designs and methodologies in control theory, online optimization, and machine learning to improve publisher yield and customer experience. You will work closely with our engineering and product team to design and implement algorithms in production. In addition, you will contribute to the end state vision of AI enhanced ad-delivery. You will be a foundational member of the team that builds a world-class, green-field ad-delivery service for Amazon's video, audio, and display advertising. To be successful in this role, you must be customer obsessed, have a deep technical background in both online algorithms and distributed systems, comfort dealing with ambiguity, an eye for detail, and a passion to identify and solve for practical considerations that occur when complex control-loops have to operate autonomously and reliably to make millisecond level decisions at scale. You are a technical leader with track record of building control theoretic and/or machine learning models in production to drive business KPIs such as budget delivery. If you are interested working on challenging and practical problems that impact hundreds of millions of users and devices and span cutting edge areas of optimization and AI while having fun on a rapidly expanding team, come join us! Key job responsibilities * Developing new statistical, causal, machine learning, and simulation techniques and develop solution prototypes to drive innovation * Developing an understanding of key business metrics / KPIs and providing clear, compelling analysis that shapes the direction of our business * Working with technical and non-technical customers to design experiments, simulations, and communicate results * Collaborating with our dedicated software team to create production implementations for large-scale data analysis * Staying up-to-date with and contributing to the state-of-the-art research and methodologies in the area of advertising algorithms * Presenting research results to our internal research community * Leading training and informational sessions on our science and capabilities * Your contributions will be seen and recognized broadly within Amazon, contributing to the Amazon research corpus and patent portfolio. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
US, WA, Seattle
The Alexa Economics team is looking for a Senior Economics Manager who is able to provide structure around complex business problems, hone those complex problems into specific, scientific questions, and test those questions to generate insights. The candidate will work with various product, analytics, science, and engineering teams to develop models and algorithms on large scale data, design pilots and measure their impact, and transform successful prototypes into data products at scale. They will lead teams of researchers to produce robust, objective research results and insights which can be communicated to a broad audience inside and outside of Alexa. Key job responsibilities Ideal candidates will work closely with business partners to develop science that solves the most important business challenges. They will work well in a team setting with individuals from diverse disciplines and backgrounds. They will serve as an ambassador for science for business teams, so that leaders are equipped with the right data and mental model to make important business decisions. Ideal candidates will own the development of scientific models and manage the data analysis, modeling, and experimentation that is necessary for estimating and validating models. They will be customer centric – clearly communicating scientific approaches and findings to business leaders, listening to and incorporate their feedback, and delivering successful scientific solutions. A day in the life - Review new technical approaches to understand Engagement and associated benefits to Alexa. - Partner with Engineering and Product teams to inject econometric insights and models into customer-facing products. - Help business teams understand the key causal inputs that drive business outcome objectives. About the team The Alexa Engagement and Economics and Team uses data, analytics, economics, statistics, and machine learning to measure, report, and track business outputs and growth. We are a team that is obsessed with understanding customer behaviors, and leveraging all aspects from customers behaviors with Alexa and Amazon to develop and deliver solutions that can drive Alexa growth and long-term business success. We use causal inference to identify business optimization and product opportunities. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA | Seattle, WA, USA
US, WA, Bellevue
We are seeking a passionate, talented, and inventive individual to join the Applied AI team and help build industry-leading technologies that customers will love. This team offers a unique opportunity to make a significant impact on the customer experience and contribute to the design, architecture, and implementation of a cutting-edge product. The mission of the Applied AI team is to enable organizations within Worldwide Amazon.com Stores to accelerate the adoption of AI technologies across various parts of our business. We are looking for an Applied Scientist to join our Applied AI team to work on LLM-based solutions. Key job responsibilities You will be responsible for developing and maintaining the systems and tools that enable us to accelerate knowledge operations and work in the intersection of Science and Engineering. You will push the boundaries of ML and Generative AI techniques to scale the inputs for hundreds of billions of dollars of annual revenue for our eCommerce business. If you have a passion for AI technologies, a drive to innovate and a desire to make a meaningful impact, we invite you to become a valued member of our team. A day in the life We are seeking an experienced Scientist who combines superb technical, research, analytical and leadership capabilities with a demonstrated ability to get the right things done quickly and effectively. This person must be comfortable working with a team of top-notch developers and collaborating with our research teams. We’re looking for someone who innovates, and loves solving hard problems. You will be expected to have an established background in building highly scalable systems and system design, excellent project management skills, great communication skills, and a motivation to achieve results in a fast-paced environment. You should be somebody who enjoys working on complex problems, is customer-centric, and feels strongly about building good software as well as making that software achieve its operational goals. About the team On our team you will push the boundaries of ML and Generative AI techniques to scale the inputs for hundreds of billions of dollars of annual revenue for our eCommerce business. If you have a passion for AI technologies, a drive to innovate and a desire to make a meaningful impact, we invite you to become a valued member of our team. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA
US, WA, Bellevue
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Some knowledge of econometrics, as well as basic familiarity with Python is necessary, and experience with SQL and UNIX would be a plus. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. You will learn how to build data sets and perform applied econometric analysis at Internet speed collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. Roughly 85% of previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Bellevue, WA, USA | Seattle, WA, USA
US, WA, Seattle
The ASFS Team is hiring an Intern in Economics. We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Knowledge of econometrics and macroeconomics, as well as familiarity with Python, Matlab, or R is necessary. This is a full-time position at 40 hours per week, with compensation being awarded on an hourly basis. You will use internal and external data to estimate macroeconometric models to answer critical business questions, also you will have the opportunity to collaborate with economists and data scientists. Roughly 85% of interns from previous cohorts have converted to full time economics employment at Amazon. If you are interested, please send your CV to our mailing list at econ-internship@amazon.com. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA | New York City, NY, USA | Seattle, WA, USA
US, WA, Bellevue
As an Applied Scientist on our Learning and Development team, you will play a critical role in driving the design, development, and delivery of learning programs and initiatives aimed at enhancing leadership and associate development within the organization. You will leverage your expertise in learning science, data analysis, and statistical model design to create impactful learning journey roadmap that align with organizational goals and priorities. Key job responsibilities 1) Research and Analysis: Conduct research on learning and development trends, theories, and best practices related to leadership and associate development. Analyze data to identify learning needs, performance gaps, and opportunities for improvement within the organization. Use data-driven insights to inform the design and implementation of learning interventions. 2) Program Design and Development: Collaborate with cross-functional teams to develop comprehensive learning programs focused on leadership development and associate growth. Design learning experiences using evidence-based instructional strategies, adult learning principles, and innovative technologies. Create engaging and interactive learning materials, including e-learning modules, instructor-led workshops, and multimedia resources. 3) Evaluation and Continuous Improvement: Develop evaluation frameworks to assess the effectiveness and impact of learning programs on leadership development and associate performance. Collect and analyze feedback from participants and stakeholders to identify strengths, areas for improvement, and future learning needs. Iterate on learning interventions based on evaluation results and feedback to continuously improve program outcomes. 4) Thought Leadership and Collaboration: Serve as a subject matter expert on learning science, instructional design, and leadership development within the organization. Collaborate with stakeholders across the company to align learning initiatives with strategic priorities and business objectives. Share knowledge and best practices with colleagues to foster a culture of continuous learning and development. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA | Nashville, TN, USA
US, WA, Seattle
Amazon Web Services (AWS) is building a world-class marketing organization, and we are looking for an experienced Economist to join the central data and science organization for AWS Marketing. This candidate will develop innovative solutions to measure the return on marketing investments. They will work closely with business leaders, scientists, and engineers to translate business and functional requirements into concrete deliverables, including the design, development, testing, and deployment of innovative measurement solutions. They will interact with functional leaders owning events (e.g. re:Invent, summits, webinars), paid media (paid search, paid social, display), AWS-owned channels (email, website, console) as well as lead management organization to drive the development, fine-tuning and adoption of the consistent measurement framework across these diverse initiatives. We seek candidates with an entrepreneurial spirit who want to make a big impact on AWS growth. They will develop strong working relationships and thrive in a collaborative team environment. They will have the creativity, curiosity, and strong judgment to work on high-impact, high-visibility products to improve the experience of AWS leads and customers. Key job responsibilities - Apply your expertise in causal inference and ML to develop systems to measure B2B marketing impact - Develop and execute science products from concept, prototype to production incorporating feedback from customers, scientists and business leaders - Identify new opportunities for leveraging economic insights and models in the marketing space - Write technical white papers and business-facing documents to clearly explain complex technical concepts to audiences with diverse business/scientific backgrounds We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | Austin, TX, USA | New York City, NY, USA | Seattle, WA, USA
US, GA, Atlanta
Looking for your next challenge? North America Sort Centers (NASC) are experiencing growth and looking for a skilled, highly motivated Data Scientist to join the NASC Engineering Data, Product and Simulation Team. The Sort Center network is the critical Middle-Mile solution in the Amazon Transportation Services (ATS) group, linking Fulfillment Centers to the Last Mile. The experience of our customers is dependent on our ability to efficiently execute volume flow through the middle-mile network. Key job responsibilities The Senior Data Scientist will design and implement solutions to address complex business questions using simulation. In this role, you will apply advanced analysis techniques and statistical concepts to draw insights from massive datasets, and create intuitive simulations and data visualizations. You can contribute to each layer of a data solution – you work closely with process design engineers, business intelligence engineers and technical product managers to obtain relevant datasets and create simulation models, and review key results with business leaders and stakeholders. Your work exhibits a balance between scientific validity and business practicality. On this team, you will have a large impact on the entire NASC organization, with lots of opportunity to learn and grow within the NASC Engineering team. This role will be the first dedicated simulation expert, so you will have an exceptional opportunity to define and drive vision for simulation best practices on our team. To be successful in this role, you must be able to turn ambiguous business questions into clearly defined problems, develop quantifiable metrics and deliver results that meet high standards of data quality, security, and privacy. About the team NASC Engineering’s Product and Analytics Team’s sole objective is to develop tools for under the roof simulation and optimization, supporting the needs of our internal and external stakeholders (i.e Process Design Engineering, NASC Engineering, ACES, Finance, Safety and Operations). We develop data science tools to evaluate what-if design and operations scenarios for new and existing sort centers to understand their robustness, stability, scalability, and cost-effectiveness. We conceptualize new data science solutions, using optimization and machine learning platforms, to analyze new and existing process, identify and reduce non-value added steps, and increase overall performance and rate. We work by interfacing with various functional teams to test and pilot new hardware/software solutions. We are open to hiring candidates to work out of one of the following locations: Atlanta, GA, USA | Bellevue, WA, USA
US, WA, Bellevue
Amazon’s Middle Mile Planning & Optimization team is looking for an exceptional Sr. Applied Scientist to solve complex optimization problems that ensure we exceed customer delivery promise expectations and minimize overall operational cost while supporting Amazon’s rapid growth globally. We use cutting edge technologies in large-scale optimization, predictive analytics, and generative AI to optimize the flow of packages within our network to efficiently match network capacity with shipment demand. Our services already handle thousands of requests per second, make business decisions impacting billions of dollars a year, and improve the delivery experience for millions of online shoppers. That said, this remains a fast-growing business and our journey has just started. Our mission is to build the most efficient and optimal transportation solution on the planet, using our technology and engineering muscle as our biggest advantage. Key job responsibilities You will work closely with product managers, research scientists, business/operations leaders, and technical leadership to build capabilities that transform our transportation network. This includes analyzing big data, building end-to-end workflows, prototype optimization/simulation models, and launch production capabilities. You will have exposure to senior leadership as you communicate results and provide scientific guidance to the business. Your insights will be a key influencer of our product strategy and roadmap and your experimental research will inform our future investment areas. About the team You will join the Surface Research Science (SRS) team, which is the science partner of the Middle-Mile Planning & Optimization tech organization. SRS is working on a fascinating range of problems, including some of the hardest and largest optimization, simulation, and prediction problems in the industry. Examples are long-term and short-term demand forecasting, capacity planning, driver scheduling, vehicle routing, and equipment rebalancing problems. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA