Amazon at SIGIR: Toward more-inclusive AI

Amazon Visiting Academic Barbara Poblete helps to build safer, more-diverse online communities — and to aid disaster response.

The annual meeting of the ACM Special Interest Group on Information Retrieval (SIGIR) begins next week, and Barbara Poblete, an Amazon Visiting Academic and associate professor of computer science at the University of Chile, is a cochair of both the Doctoral Consortium and the Diversity, Equity and Inclusion Committee.

Barbara Poblete, an associate professor of computer science at the University of Chile, and an Amazon Visiting Academic.
Barbara Poblete, an associate professor of computer science at the University of Chile, and an Amazon Visiting Academic.

Poblete, who first attended the conference as a graduate student in 2006, has been a member of the Diversity, Equity, and Inclusion (DEI) Committee since its creation in 2019.

“It’s critical for the SIGIR conferences and the SIGIR community as a whole to be as inclusive as possible,” she says. “Everybody should feel welcome and be treated with respect and dignity. The cochairs represent different communities: I have worked on a lot of women-in-computer-science initiatives — Chile Women in Computing is an event I organize — and I also represent South America. My two cochairs have experience in diversity-related initiatives in South Africa and Europe. We have created a set of guidelines for the SIGIR conferences, which is shared with the organizers. This ‘inclusivity checklist’, as we call it, helps make these conferences more inclusive.”

Hate speech detection

Poblete was a natural choice for SIGIR’s DEI Committee as much of her research focuses on extending the benefits of machine learning to new communities and making members of online communities feel safer and more welcome.

Even between my country, Chile, and Argentina, the hate speech vocabularies are different. It’s not just linguistic adaptation; there’s cultural adaptation as well.
Barbara Poblete

“I work in hate speech detection, and I have been focusing on the multilingual aspect because we have found that prior work is mostly centered on the English language,” she says. “This creates a gap for South America and for other countries where English is not the primary language.”

In this context, Poblete says, the chief technical challenge is to leverage English-language resources in order to build models for non-English linguistic communities with comparatively little training data.

“It's not always easy,” Poblete says. “For example, the hate speech detection problem varies from country to country. Even between my country, Chile, and Argentina, the hate speech vocabularies are different. It’s not just linguistic adaptation; there’s cultural adaptation as well.”

The natural way to try to adapt English-language models to other languages, Poblete explains, is to use multilingual embeddings, in which related words in different languages are mapped to the same regions of a representational space. But, she says, “they don’t usually work that well for this kind of problem.”

“One thing we do is dataset enrichment,” Poblete says. “So, for example, I have my dataset in Spanish, and I will add labeled English data to that to see if I can improve my classifier by adding the multilingual data. Or we try to create specific embeddings for certain domains, such as hate speech. Or we try to train embeddings that are biased towards a particular kind of problem.”

Disaster detection

Poblete is well known for her work on social-media analysis: at the Web Conference 2021, in April, for instance, she and her colleagues won a test-of-time award for their 2011 paper “Information credibility on Twitter”. In more recent work, her group at the University of Chile has begun to apply techniques of social-media analysis to problems in the field of crisis informatics

“We use social-media data to try to improve tools for disaster detection and collection of information,” Poblete explains. “Earthquakes and floods happen a lot in Chile, so there's a lot of interest in that. And it's kind of a similar problem: how can we use resources from other languages for our language? How can we create universal tools that anybody could use, that don't require a lot of resources?”

Poblete’s group has developed a website, called twicalli.cl, that uses machine learning models to automatically process tweets in order to gauge the perceived intensity of earthquakes.

“This is used by the National Seismology Center here in Chile,” Poblete says. “It’s used also by the navy, and a lot of emergency offices depend on this. We have a lot of seismographs in Chile — this is a very advanced field in Chile — but they cannot really tell how people felt the earthquake. This is important information because you could have two earthquakes with the same magnitude in different places, but they will be felt differently depending on how deep the earthquake was or the kind of terrain.  

“For areas where you have a large population, and they're tweeting, we can estimate that in 30 minutes. And that used to sometimes take days and require experts to be in the place where the earthquake happened. When you're in crisis management, the first minutes are super important. The information you gather in these first minutes will change how you respond to the emergency and how fast help will arrive.”

Poblete is speaking from her home in Santiago, using Amazon’s Chime videoconferencing service, and suddenly, the onscreen image of her home office begins to jitter.

Twicalli screen shot blurred.jpg
Barbara Poblete’s real-time screen share, using Amazon Chime, of the twicalli.cl dashboard. The top timeline indicates the frequency of tweets that use earthquake-related language; the lower timeline indicates a zoom-in of the most recent activity.

“Oh look, there's an earthquake here right now,” she says. “Let me just find twicalli to see if it actually detected this.” She shares her screen through Chime. “This is the seismograph, and these are earthquakes we had before. In this portion of the graph, you can see tweets spike with people talking about this. That way, people at the seismology center know that people actually felt this earthquake.”

“The problem we're working on right now,” Poblete says, “is to detect messages that are relevant to a crisis versus those that are noise. You want to separate the messages that are coming from the place of the event from other people who are just mentioning these things. When you have a hashtag that is popular, like the Nepal earthquake, you get a lot of messages that have nothing to do with it that are just mentioning the same hashtag. To tell those two things apart, we train disaster-specific word embeddings for that problem. And we're testing if we can enhance the information that we have in Spanish for earthquakes in Chile with earthquakes from other countries. And not only cross-language learning but also cross-domain. Can I can learn from earthquakes to detect hurricanes or floods or something new that never happened before? Because that is also part of emergency preparedness.”

Related content

US, WA, Bellevue
The Artificial General Intelligent team (AGI) seeks an Applied Scientist with a strong background in machine learning and production level software engineering to spearhead the advancement and deployment of cutting-edge ML systems. As part of this team, you will collaborate with talented peers to create scalable solutions for an innovative conversational assistant, aiming to revolutionize user experiences for millions of Alexa customers. The ideal candidate possesses a solid understanding of machine learning fundamentals and has experience writing high quality software in production setting. The candidate is self-motivated, thrives in ambiguous and fast-paced environments, possess the drive to tackle complex challenges, and excel at swiftly delivering impactful solutions while iterating based on user feedback. Join us in our mission to redefine industry standards and provide unparalleled experiences for our customers. Key job responsibilities You will be expected to: · Analyze, understand, and model customer behavior and the customer experience based on large scale data · Build and measure novel online & offline metrics for personal digital assistants and customer scenarios, on diverse devices and endpoints · Create, innovate and deliver deep learning, policy-based learning, and/or machine learning based algorithms to deliver customer-impacting results · Build and deploy automated model training and evaluation pipelines · Perform model/data analysis and monitor metrics through online A/B testing · Research and implement novel machine learning and deep learning algorithms and models. We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA | Boston, MA, USA
CA, ON, Toronto
Looking for your next challenge? North America Sort Centers (NASC) are experiencing growth and looking for a skilled, highly motivated Data Scientist to join the NASC Engineering Data, Product and Simulation Team. The Sort Center network is the critical Middle-Mile solution in the Amazon Transportation Services (ATS) group, linking Fulfillment Centers to the Last Mile. The experience of our customers is dependent on our ability to efficiently execute volume flow through the middle-mile network. Key job responsibilities The Data Scientist will design and implement solutions to address complex business questions using simulation. In this role, you will apply advanced analysis techniques and statistical concepts to draw insights from massive datasets, and create intuitive simulations and data visualizations. You can contribute to each layer of a data solution – you work closely with process design engineers, business intelligence engineers and technical product managers to obtain relevant datasets and create simulation models, and review key results with business leaders and stakeholders. Your work exhibits a balance between scientific validity and business practicality. On this team, you will have a large impact on the entire NASC organization, with lots of opportunity to learn and grow within the NASC Engineering team. This role will be the first dedicated simulation expert, so you will have an exceptional opportunity to define and drive vision for simulation best practices on our team. To be successful in this role, you must be able to turn ambiguous business questions into clearly defined problems, develop quantifiable metrics and deliver results that meet high standards of data quality, security, and privacy. About the team NASC Engineering’s Product and Analytics Team’s sole objective is to develop tools for under the roof simulation and optimization, supporting the needs of our internal and external stakeholders (i.e Process Design Engineering, NASC Engineering, ACES, Finance, Safety and Operations). We develop data science tools to evaluate what-if design and operations scenarios for new and existing sort centers to understand their robustness, stability, scalability, and cost-effectiveness. We conceptualize new data science solutions, using optimization and machine learning platforms, to analyze new and existing process, identify and reduce non-value added steps, and increase overall performance and rate. We work by interfacing with various functional teams to test and pilot new hardware/software solutions. We are open to hiring candidates to work out of one of the following locations: Toronto, ON, CAN
US, WA, Seattle
Join us at the cutting edge of Amazon's sustainability initiatives to work on environmental and social advancements to support Amazon's long term worldwide sustainability strategy. At Amazon, we're working to be the most customer-centric company on earth. To get there, we need exceptionally talented, bright, and driven people. The Worldwide Sustainability (WWS) organization capitalizes on Amazon’s scale & speed to build a more resilient and sustainable company. We manage our social and environmental impacts globally, driving solutions that enable our customers, businesses, and the world around us to become more sustainable. Sustainability Science and Innovation (SSI) is a multi-disciplinary team within the WW Sustainability organization that combines science, analytics, economics, statistics, machine learning, product development, and engineering expertise. We use this expertise and skills to identify, develop and evaluate the science and innovations necessary for Amazon, customers and partners to meet their long-term sustainability goals and commitments. We are seeking a Principal Applied Scientist who is not just adept in the theoretical aspects of Machine Learning (ML), Artificial Intelligence (AI), and Large Language Models (LLMs) but also possesses a pragmatic, hands-on approach to navigating the complexities of innovation. You will take the lead in conceptualization, building, and launching innovative models and solutions that significantly drive material impacts for our long-term sustainability and climate goals. You'll be guided by problems and customer needs. You'll use strong technical judgment to determine appropriate approaches - custom pre-training models, fine-tuning trusted base models, leveraging retrieval-augmented generation (RAGs), or combining approaches. You'll collaborate with business leaders, scientists, and engineers to incorporate sustainability domain nuances when creating data foundations, developing AI models/applications, and applying techniques like data indexing, validation metrics, model distillation, and customized loss functions. You'll work across teams to embed AI/ML solutions and capabilities into existing sustainability data and systems. You'll define key AI sustainability research directions, adopt/invent new ML techniques, conduct rigorous experiments, publish results, and ensure research translates into practice. You'll develop long-term strategies, persuade teams, propose goals and deliver. If you see yourself as a hands-on technical leader and innovator at the intersection of AI, technology, and sustainability, we'd like to connect. You don't need to be an expert in sustainability and climate domains. Key job responsibilities - Creating web-scale sustainability-specific data foundations that align with our impact areas and sustainability goals; - Models to measure environmental and economic impacts at scale; - Automated solutions simplifying complex, labor-intensive ESG tasks; reasoning mechanisms for multi-view decarbonization plans and multi-objective optimization models; - Models to create, monitor, and quality assure high-integrity forest carbon credits. About the team Diverse Experiences: World Wide Sustainability (WWS) values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Inclusive Team Culture: It’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth: We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance: We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve. We are open to hiring candidates to work out of one of the following locations: Arlington, VA, USA | New York City, NY, USA | San Francisco, CA, USA | Seattle, WA, USA
US, CA, San Francisco
If you are interested in this position, please apply on Twitch's Career site https://www.twitch.tv/jobs/en/ About Twitch Launched in 2011, Twitch is a global community that comes together each day to create multiplayer entertainment: unique, live, unpredictable experiences created by the interactions of millions. We bring the joy of co-op to everything, from casual gaming to world-class esports to anime marathons, music, and art streams. Twitch also hosts TwitchCon, where we bring everyone together to celebrate and grow their personal interests and passions. We're always live at Twitch. About the Position We are looking for applied scientists to solve challenging and open-ended problems in the domain of user and content safety. As an applied scientist on Twitch's Community team, you will use machine learning to develop data products tackling problems such as harassment, spam, and illegal content. You will use a wide toolbox of ML tools to handle multiple types of data, including user behavior, metadata, and user generated content such as text and video. You will collaborate with a team of passionate scientists and engineers to develop these models and put them into production, where they can help Twitch's creators and viewers succeed and build communities. You will report to an Applied Science Manager. This position will be located in San Francisco. You Will - Build machine learning products to protect Twitch and its users from abusive behavior such as harassment, spam, and violent or illegal content. - Work backwards from customer problems to develop the right solution for the job, whether a classical ML model or a state-of-the-art one. - Collaborate with Community Health's engineering and product management team to productionize your models into flexible data pipelines and ML-based services. - Continue to learn and experiment with new techniques in ML, software engineering, or safety so that we can better help communities on Twitch grow and stay safe. Perks - Medical, Dental, Vision & Disability Insurance - 401(k) - Maternity & Parental Leave - Flexible PTO - Amazon Employee Discount We are open to hiring candidates to work out of one of the following locations: San Francisco, CA, USA
US, CA, San Diego
The Private Brands team is looking for an Applied Scientist to join the team in building science solutions at scale. Our team applies Optimization, Machine Learning, Statistics, Causal Inference, and Econometrics/Economics to derive actionable insights. We are an interdisciplinary team of Scientists, Engineers, and Economists and primary focus on building optimization and machine learning solutions in supply chain domain with specific focus on Amazon private brand products. Key job responsibilities You will work with business leaders, scientists, and economists to translate business and functional requirements into concrete deliverables, including the design, development, testing, and deployment of highly scalable optimization solutions and ML models. This is a unique, high visibility opportunity for someone who wants to have business impact, dive deep into large-scale problems, enable measurable actions on the consumer economy, and work closely with scientists and economists. As a scientist, you bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions. We are particularly interested in candidates with experience in predictive and machine learning models and working with distributed systems. Academic and/or practical background in Machine Learning are particularly relevant for this position. Familiarity and experience in applying Operations Research techniques to supply chain problems is a plus. We are open to hiring candidates to work out of one of the following locations: San Diego, CA, USA | Seattle, WA, USA
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a highly-skilled Applied Scientist, to support the development and implementation of cutting-edge algorithms and push the boundaries of efficient inference for Generative Artificial Intelligence (GenAI) models. As an Applied Scientist, you will play a critical role in driving the development of GenAI technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities - Design and execute experiments to evaluate the performance of different decoding algorithms and models, and iterate quickly to improve results - Develop deep learning models for compression, system optimization, and inference - Collaborate with cross-functional teams of engineers and scientists to identify and solve complex problems in GenAI We are open to hiring candidates to work out of one of the following locations: Bellevue, WA, USA | Boston, MA, USA | New York, NY, USA | Sunnyvale, CA, USA
US, CA, Pasadena
The Amazon Web Services (AWS) Center for Quantum Computing (CQC) is a multi-disciplinary team of scientists, engineers, and technicians, on a mission to develop a fault-tolerant quantum computer. We are looking to hire a Research Scientist with fabrication and data analysis experience working on Josephson Junction elements of a superconducting circuit. The position is on-site at our lab, located on the in Pasadena, CA. The ideal candidate will have had prior experience deep diving into fabrication details and electrical test data. We are looking for candidates with strong engineering principles, resourcefulness and data science experience. Organization and communication skills are essential. Key job responsibilities * Deep dive into the physics and related data associated with Josephson Junctions or metal-insulator-metal fabrication processes. * Develop and maintain data pipeline pertinent to superconducting device fabrication, in particular Josephson Junctions or general transmon elements. * Develop analytical tools to uncover new information about established and new junction processes. * Generate both custom and standardized reports summarizing inline and end of line electrical and process data from product material runs. * Devise experiments and provide recommendations for improvement of fabrication processes. * Communicate findings with colleagues by way of crisp documentation and presentations. A day in the life The role will be vital to the fabrication team and quantum computing device integration mechanism. The candidate will provide the most current information to project leads and fabrication area owners to drive data driven decision of production runs. Once the fabrication run starts the candidate will stay close to the details of fabrication providing data analysis and quick feedback to key stakeholders. At the end of fabrication runs custom and standardized reports will be generated by the candidate to provide insights into data generated from the run. This position may require occasional weekend work. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices. About the team Our team is comprised of scientists and engineers who are building hardware that enables quantum computing technologies. Doing that requires the fabrication of quantum devices, which necessitates staying close to the details and analyzing data while building tools to better understand the data. We are open to hiring candidates to work out of one of the following locations: Pasadena, CA, USA
US, CA, Sunnyvale
The Amazon Devices team designs and engineers high-profile consumer electronics, including the best-selling Kindle family of products. We have also produced groundbreaking devices like Fire tablets, Fire TV, Amazon Dash, and Amazon Echo. What will you help us create? Work hard. Have fun. Make history. If you are an innovative Applied Scientist, have a track record of delivering to timelines with high quality and are deeply technical, we want to talk to you. You will be closely integrated with the research and development team, both developing and optimizing features. You will work with other world-leading scientists to build and deliver the world's most scalable robotics systems, working together from ideation-to-production using tools such as Computer Vision Deep Learning instance segmentation, pose estimation, activity understanding), CV geometry, active learning and reinforcement learning. A successful candidate will have excellent technical ability, scientific vision, project management skills, great communication skills, and a motivation to achieve results in a collaborative team environment. We are open to hiring candidates to work out of one of the following locations: Sunnyvale, CA, USA
GB, London
Amazon Advertising is looking for a Senior Applied Scientist to join its brand new initiative that powers Amazon’s contextual advertising product. Advertising at Amazon is a fast-growing multi-billion dollar business that spans across desktop, mobile and connected devices; encompasses ads on Amazon and a vast network of hundreds of thousands of third party publishers; and extends across US, EU and an increasing number of international geographies. We are looking for a dynamic, innovative and accomplished Senior Applied Scientist to work on machine learning and data science initiatives for contextual data processing and classification that power our contextual advertising solutions. Are you excited by the prospect of analyzing terabytes of data and leveraging state-of-the-art data science and machine learning techniques to solve real world problems? Do you like to own business problems/metrics of high ambiguity where yo get to define the path forward for success of a new initiative? As an applied scientist, you will invent ML and Artificial General Intelligence based solutions to power our contextual classification technology. As this is a new initiative, you will get an opportunity to act as a thought leader, work backwards from the customer needs, dive deep into data to understand the issues, conceptualize and build algorithms and collaborate with multiple cross-functional teams. Key job responsibilities * Design, prototype and test many possible hypotheses in a high-ambiguity environment, making use of both analysis and business judgment. * Collaborate with software engineering teams to integrate successful experiments into large-scale, highly complex Amazon production systems. * Promote the culture of experimentation and applied science at Amazon. * Demonstrated ability to meet deadlines while managing multiple projects. * Excellent communication and presentation skills working with multiple peer groups and different levels of management * Influence and continuously improve a sustainable team culture that exemplifies Amazon’s leadership principles. About the team The Supply Quality organization has the charter to solve optimization problems for ad-programs in Amazon and ensure high-quality ad-impressions. We develop advanced algorithms and infrastructure systems to optimize performance for our advertisers and publishers. We are focused on solving a wide variety of problems in computational advertising like Contextual data processing and classification, traffic quality prediction (robot and fraud detection), Security forensics and research, Viewability prediction, Brand Safety and experimentation. Our team includes experts in the areas of distributed computing, machine learning, statistics, optimization, text mining, information theory and big data systems. We are open to hiring candidates to work out of one of the following locations: London, GBR
ES, M, Madrid
At Amazon, we are committed to being the Earth’s most customer-centric company. The International Technology group (InTech) owns the enhancement and delivery of Amazon’s cutting-edge engineering to all the varied customers and cultures of the world. We do this through a combination of partnerships with other Amazon technical teams and our own innovative new projects. You will be joining the Tools and Machine learning (Tamale) team. As part of InTech, Tamale strives to solve complex catalog quality problems using challenging machine learning and data analysis solutions. You will be exposed to cutting edge big data and machine learning technologies, along to all Amazon catalog technology stack, and you'll be part of a key effort to improve our customers experience by tackling and preventing defects in items in Amazon's catalog. We are looking for a passionate, talented, and inventive Scientist with a strong machine learning background to help build industry-leading machine learning solutions. We strongly value your hard work and obsession to solve complex problems on behalf of Amazon customers. Key job responsibilities We look for applied scientists who possess a wide variety of skills. As the successful applicant for this role, you will with work closely with your business partners to identify opportunities for innovation. You will apply machine learning solutions to automate manual processes, to scale existing systems and to improve catalog data quality, to name just a few. You will work with business leaders, scientists, and product managers to translate business and functional requirements into concrete deliverables, including the design, development, testing, and deployment of highly scalable distributed services. You will be part of team of 5 scientists and 13 engineers working on solving data quality issues at scale. You will be able to influence the scientific roadmap of the team, setting the standards for scientific excellence. You will be working with state-of-the-art models, including image to text, LLMs and GenAI. Your work will improve the experience of millions of daily customers using Amazon in Europe and in other regions. You will have the chance to have great customer impact and continue growing in one of the most innovative companies in the world. You will learn a huge amount - and have a lot of fun - in the process! This position will be based in Madrid, Spain We are open to hiring candidates to work out of one of the following locations: Madrid, M, ESP