How to make AI better at reading comprehension

AI models exceed human performance on public data sets; modified training and testing could help ensure that they aren’t exploiting short cuts.

Question answering through reading comprehension is a popular task in natural-language processing. It’s a task many people know from standardized tests: a student is given a passage and questions based on the passage — say, an article on William the Conqueror and the question “When did William invade England?” The student reads the passage and learns that the answer is 1066. In natural-language processing, we aim to teach machine learning models to do the same thing.

Historical documentation on a computer screen with the date highlighted
In natural-language understanding, reading comprehension involves finding an excerpt from a text that can serve as an answer to a question about that text.
Credit: Glynis Condon

In recent years, question-answering models have made a lot of progress. In fact, models have started outperforming human baselines on public leaderboards such as SQuAD 2.0.

Are the models really learning question answering, or are they learning heuristics that work only in some circumstances? We investigate this question in our paper “What do models learn from question answering datasets?”, which we’re presenting at the Conference on Empirical Methods in Natural Language Processing (EMNLP).

In this paper, we subject question-answering models built atop the popular BERT linguistic model to a variety of simple yet informative attacks. We identify shortcomings that cast doubt on the idea that models are really outperforming humans. In particular, we find that

(1) Models don’t generalize well

A student who is a good critical reader should be able to answer questions about a variety of articles. A student who can answer questions about William the Conqueror but not Julius Caesar may not have learned reading comprehension —just information about William the Conqueror.

Graph that shows the performance of a question-answering model trained on SQuAD evaluated across multiple datasets.
This graph shows the performance of a question-answering model trained on SQuAD and evaluated across five other datasets. While the model does well on its own test set (75.6), its performance is lower on other data sets. 

Question-answering models do not generalize well across data sets. A model that does well on the SQuAD data set doesn’t do well on the Natural Questions data set, even though both contain questions about Wikipedia articles. This suggests that models can solve individual data sets without necessarily learning reading comprehension more generally.

(2) Models take short cuts

When testing question-answering models, we assume that high performance means good understanding of the subject. But tests can be flawed. If a student takes a multiple-choice test where every answer is “C”, it’s hard to judge whether the student really understood the material or exploited the flaw. Similarly, models may be picking up on biases in test questions that let them arrive at the correct answer without doing reading comprehension. 

To probe this, we conducted three experiments. The first was a modification at training time: we corrupted training sets by replacing correct answers with incorrect answers — for instance, “Q: ‘When did William invade England?’ A: ‘William is buried in Caen’”. 

The other two were modifications at test time. In one, we shuffled the sentences in the input articles so that they no longer formed coherent paragraphs. In the other, we gave models incomplete questions (“When did William?”, “When?”, or no words at all). 

In all these experiments, the models were suspiciously robust, continuing to return correct answers. This means that they didn’t need to do reading comprehension at training time or at test time to understand the structure of the articles or be asked the full question.

How can this be? It turns out that some questions in some data sets can be answered trivially. In our experiments, for example, one model was just answering all “who” questions with the first proper name in the passage. Simple rules like this can get us to almost 40% of current model baselines.

(3) Models aren’t prepared to handle variations

Graph that shows the performance of a Natural Questions model against various attacks.
This graph shows the performance of a Natural Questions model against various attacks: 50% corrupt, in which half the labeled answers in the training data are wrong; shuffled context, in which the sentences of the test excerpts are out of order; no question, in which the questions in the test data are incomplete; filler words, in which fillers such as “really” and “actually” are added in a syntactically correct way; and negation, in which the negative of the test question is substituted for the positive (“When didn’t William invade England?”). Where we would expect much lower performance in the first three cases, we instead see surprising robustness. Where we would expect to see little change with filler words, we see a drop of almost 7 F1 points. On the negation task, the model answers 94% of questions the same way it did when they were positively framed.

A student should understand that “When did William invade England?”, “When did William march his army into England?”, and “When was England invaded by William?” are all asking the same question. But models can still struggle with this.

We conducted two experiments where we ran variations of questions through reading comprehension models. First, we tried the very simple change of adding filler words to questions (“When did William really invade England?”). In principle, this should have no effect on performance, but we found that it reduces the model’s F1 score — a metric that factors in both false positives and false negatives — by up to 8%. 

Next, we added negation (“When didn’t William invade England?”) to see if models understood the difference between positive and negative questions. We found that models ignore negation up to 94% of the time and return the same answers they would to positive questions.

Conclusions

Our experiments suggest that models are learning short cuts rather than performing reading comprehension. While this is disappointing, it can be fixed. We believe that following these five suggestions can lead to better question-answering data sets and evaluation methods in the future:

  • Test for generalizability: Report performance across multiple relevant data sets to make sure a model is not just solving a single data set;
  • Challenge the models: Discard questions that can be solved trivially — for example, by always returning the first proper noun;
  • Good performance does not guarantee understanding: Probe data sets to ensure models are not taking short cuts;
  • Include variations: Add variations to existing questions to check model flexibility;
  • Standardize data set formats: Consider following a standard format when releasing new data sets, as this makes cross-data-set experimentation easier. We offer some help in this regard by releasing code that converts the five data sets in our experiments into a shared format.
About the Author
Priyanka Sen is a computational linguist in the Alexa AI organization.

Related content

Work with us

See more jobs
US, WA, Seattle
AWS Lambda (https://aws.amazon.com/lambda) is changing the way that companies big and small think about computing in the cloud. Lambda functions offer customers a "serverless" way to create applications, an approach that lets customers turn business logic and application code into scalable, fault-tolerant production systems without requiring every developer to become an expert in distributed systems, deployment technologies, and infrastructure management.We are looking for a proven leader to help lead our Applied Science team as we build a team of talented and passionate scientists to leverage the vast amounts of data that millions of Lambda invocations a second generates and use that to both drive business growth opportunities and service efficiency.We're looking for a leader who combines exceptional technical, research and analytical capabilities to build and lead a team that will be integral to the continued improvement of AWS Lambda. As a Scicene Manager, you will be responsible for leading a team of researchers and data experts in the design, development, testing, and deployment models to solve challenges like:· Predicting customer scaling patterns so that Lambda just meets their needs as if by magic.· Identifying opportunities to more effectively utilize the resources that millions of Lambda functions execute on.· Conducting and coordinating process development leading to improved and streamlined processes for model development. Strong customer focus is essential.· Providing technical and scientific guidance to your team members.· Communicating effectively with senior management as well as with colleagues from science, engineering and business backgrounds.· Supporting the career development of your team members.Some recent papers the team produced are:https://www.amazon.science/publications/fireplace-placing-firecracker-virtual-machines-with-hindsight-imitationhttps://www.amazon.science/publications/firecracker-lightweight-virtualization-for-serverless-applicationsThe successful candidate will have an established background in developing analytical models, a strong technical ability, demonstrated experience in people management, excellent project management skills, great communication skills, and the motivation to achieve results in a fast-paced environment.About Us:Inclusive Team CultureOur team is diverse! We drive towards an inclusive culture and work environment. We are intentional about attracting, developing, and retaining amazing talent from diverse backgrounds. Team members are active in Amazon’s 10+ affinity groups, sometimes known as employee resource groups, which bring employees together across businesses and locations around the world. These range from groups such as the Black Employee Network, Latinos at Amazon, Indigenous at Amazon, Families at Amazon, Amazon Women and Engineering, LGBTQ+, Warriors at Amazon (Military), Amazon People With Disabilities, and more.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives. This position involves on-call responsibilities, typically for one week every two months. We don’t like getting paged in the middle of the night or on the weekend, so we work to ensure that our systems are fault tolerant. When we do get paged, we work together to resolve the root cause so that we don’t get paged for the same issue twice.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.
US, WA, Seattle
Alexa is the groundbreaking cloud-based intelligent agent that powers Echo and other devices designed around your voice. The Alexa Economics team is looking for a Research Scientist to join us to support measurement of the Alexa business and to provide actionable insights across the Alexa ecosystem.A day in the lifeAs a Research Scientist you will partner with other scientists to build algorithms and have exposure to senior leadership as we communicate results and provide guidance to the business. You will work with product managers, SDEs, financial analysts, data scientists, and economists to better understand the features customers love and how to optimize customer discovery of these features. You will analyze large amount of business data, build research agenda, and identify key metrics we will use to track customer engagement across a wide range of touch points with Alexa. You will also help the broader team identify new features and business opportunities through your research.A successful candidate will be able to partner effectively with both business and technical teams, including clear communication of results across a variety of stakeholders. He/she will be an expert in statistics and machine learning. This high-impact role provides a great opportunity to demonstrate capabilities to dive deep, deliver results, think big, invent and simplify, and earn trust.About the hiring groupOur team is identifying the key drivers for engaging customers on the Alexa platform across devices, skills and services. As a part of the larger Alexa Customer Experience team, your area of influence and impact will be all of the Alexa organization across the globe. You will have a front row seat to the evolving voice assistant industry and opportunities to impact the customer experience of a cutting edge product used every day by people you know.Job responsibilitiesAs a member of the Alexa Econ team, you will have following technical and leadership responsibilities:· Interact with engineering, scientists, and business teams to develop new algorithms to measure impact of Alexa experiences;· Understand and mine the large amount of Alexa data, prototype and implement new learning algorithms and prediction techniques to improve model accuracy;· Contribute to progress of the Alexa and broader research communities by producing publicationsAmazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
US, NY, New York
Passionate about Deep Learning, Causal Inference, and Big Data Systems? Interested in building new state-of-the-art measurement products at petabyte scale? Be part of a team of industry leading experts that operates one of the largest big data and machine learning stacks at Amazon. Amazon is leveraging its highly unique data and applying the latest machine learning and big data technologies to change the way marketers optimize their advertising spend. Our campaign measurement and reporting systems apply these technologies on many billions of events in near real time.In this role you will lead a team of scientists to tackle some of the hardest problems in advertising; measuring ads incrementality, providing estimated counterfactuals and predicting the success of advertising strategies. You and your team will develop state of the art causal learning, deep learning, and predictive techniques to help marketers understand and optimize their spend. As the primary leader for the Measurement Science team you will be partnering with VPs and Directors across all of our Ads verticals, driving growth and innovation.Some things you'll do in this role:· Work closely with scientists and engineers to architect and develop the best technical design and approach.· Be a hands-on technical leader and player-coach; inspire and empower innovation and thinking big in those around you.· Hire and develop a high performing team of scientists, raising the bar with each hire and mentoring your team to attain their business and career goals· Develop and execute project plans and delivery commitments; manage the day-to-day activities of the science teamAmazon is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation.
US, CA, Sunnyvale
Are you a passionate scientist in the area of computer vision and machine learning who is aspired to develop new and innovative technologies to new product categories? Are you interested in applying your deep knowledge to new and challenging areas? Are you looking to scale capabilities computer vision and machine learning capabilities to new workload sizes? Are you up to the task of delivering innovative and scalable technology that manages automated recognition of millions of items?You will be part of a passionate team whose missions is to push the frontier of computer vision and machine learning technology into the smart home application area. This is a great opportunity for you to innovate in this space by developing algorithms at the edge and in the cloud, and integrating them into consumer services to enable a premium customer experience. In this role, you will be an owner of the full algorithm development cycle, from sensor evaluation and data engineering to algorithm design, implementation, optimization and deployment. This position also requires experience with developing efficient software components on resource-constrained computing platforms on the edge. You will collaborate with different Amazon teams to make informed decisions on the best practices in machine learning to build highly-optimized integrated hardware and software platforms.Job responsibilities· Apply best practices to investigate, acquire, process and analyze data sources for algorithm development.· Research and implement the state-of-the-art methods in computer vision and machine learning to deliver algorithms that meets product specifications.· Design, build algorithm evaluation frameworks, schedule and report algorithm performance on a regular basis.· Optimize and deploy algorithms on target hardware platforms.· Establish, develop and maintain frameworks and procedures for image sensor selection and evaluation and image quality monitoring.· Influence system design by making informed decisions on the selection of data sources, algorithms and sensors.Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
US, CA, Sunnyvale
Are you a passionate scientist in the area of computer vision and machine learning who is aspired to develop new and innovative technologies to new product categories? Are you interested in applying your deep knowledge to new and challenging areas? Are you looking to scale capabilities computer vision and machine learning capabilities to new workload sizes? Are you up to the task of delivering innovative and scalable technology that manages automated recognition of millions of items?You will be part of a passionate team whose missions is to push the frontier of computer vision and machine learning technology into the smart home application area. This is a great opportunity for you to innovate in this space by developing algorithms at the edge and in the cloud, and integrating them into consumer services to enable a premium customer experience. In this role, you will be an owner of the full algorithm development cycle, from sensor evaluation and data engineering to algorithm design, implementation, optimization and deployment. This position also requires experience with developing efficient software components on resource-constrained computing platforms on the edge. You will collaborate with different Amazon teams to make informed decisions on the best practices in machine learning to build highly-optimized integrated hardware and software platforms.Job responsibilities· Apply best practices to investigate, acquire, process and analyze data sources for algorithm development.· Research and implement the state-of-the-art methods in computer vision and machine learning to deliver algorithms that meets product specifications.· Design, build algorithm evaluation frameworks, schedule and report algorithm performance on a regular basis.· Optimize and deploy algorithms on target hardware platforms.· Establish, develop and maintain frameworks and procedures for image sensor selection and evaluation and image quality monitoring.· Influence system design by making informed decisions on the selection of data sources, algorithms and sensors.Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
US, WA, Seattle
We are a team of doers working passionately to apply cutting-edge advances in technology to solve real-world problems. As a Research Scientist, you will work with a unique and gifted team developing exciting products for consumers and collaborate with cross-functional teams. Our team rewards intellectual curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the cutting edge of both academic and applied research in this product area, you have the opportunity to work together with some of the most talented scientists, engineers, and product managers.Inclusive Team Culture:Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.Work/Life BalanceOur team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives.Mentorship & Career GrowthOur team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.
US, WA, Bellevue
Amazon’s Strategic Sourcing Analytics (SSA) team is looking to hire an experienced Data Scientist to build and improve the world-class science driven supply chain ecosystem that supports sophisticated decision making to improve product availability, lead times, vendor performance and lower costs. Our teams work is focused on saving hundreds of millions of dollars using cutting edge science, machine learning, and scalable distributed software on the cloud that automates and optimizes supply chain under the uncertainty of demand, pricing and supply. This is an opportunity to think big about how to optimize the world’s most dynamic supply chain.Strategic Sourcing is part of Supply Chain Optimization Technology (SCOT) - a centralized organization that owns automated systems for demand forecasting, inventory management, sourcing, inbound optimization, Fast Track promise and order fulfillment. Learn more here: https://www.youtube.com/watch?v=ncwsr1Of6Cw&feature=youtu.beThe Strategic Sourcing team works with vendor inputs/constraints, supply chain signals and other SCOT systems to execute a sourcing strategy, ultimately translating an optimal plan into real world execution by systematically connecting with suppliers. Sourcing owns and develops systems to systematically negotiate cost with vendors, determine the right inbound channel, optimize how we purchase inventory from our suppliers, and automate procurement by integrating with Vendor systems. Sourcing team aims to maximize supply availability and optimize for total sourcing cost that impacts both topline and bottom line of Amazon.The ideal candidate will have extensive experience in Data Science concepts, data analytics and have the aptitude to incorporate new approaches and methodologies while dealing with ambiguities in sourcing processes. Excellent business communication skills are a must to develop and define key business questions and to build data sets that answer those questions. You must be comfortable working with business customers in understanding the business requirements and implementing reporting solutions.In joining our team, you'll enjoy working closely with smart engineers and scientists along with growth other benefits. We have a creative and comfortable work environment and this is your opportunity to be part of a fast-paced and a growing data-technology team.
US, WA, Seattle
How to use the world’s richest collection of e-commerce data to improve payments experience for our customers? Amazon Consumer Payments Global Data Science team seeks a Data Scientist for building analytical solutions that will address increasingly complex business questions in the North America Credit space.Amazon.com has a culture of data-driven decision-making and demands insights that are timely, accurate, and actionable. This team provides a fast-paced environment where every day brings new challenges and new opportunities.As a Data Scientist in this team, you will be driving the analytics roadmap and will provide descriptive and predictive solutions to the North America Credit business team through a combination of data mining techniques as well as use statistical and machine learning techniques for segmentation and prediction. You will need to collaborate effectively with internal stakeholders, cross-functional teams to solve problems, create operational efficiencies, and deliver successfully against high organizational standards.Responsibilities· Demonstrate thorough technical knowledge on feature engineering of massive datasets, effective exploratory data analysis, and model building using industry standard regression and classification techniques such as Random Forest, XGBoost package, Keras framework· Understand the business reality behind large sets of data and develop meaningful solutions comprising of analytics as well as marketing management· Work closely with internal stakeholders like the business teams, engineering teams and partner teams and align them with respect to your focus area· Innovate by adapting new modeling techniques and procedures· You should be passionate about working with huge data sets and be someone who loves to bring datasets together to answer business questions. You should have deep expertise in creation and management of datasets· You should have exposure at implementing and operating stable, scalable data flow solutions from production systems into end-user facing applications/reports. These solutions will be fault tolerant, self-healing and adaptive· You will extract huge volumes of data from various sources and message streams and construct complex analyses. You will implement data flow solutions that process data real time on message streams from source systems· You should be detail-oriented and must have an aptitude for solving unstructured problems. You should work in a self-directed environment, own tasks and drive them to completion.· You should have excellent business and communication skills to be able to work with business owners to develop and define key business questions and to build data sets that answer those questions· Your teams will work with distributed machine learning and statistical algorithms upon a large Hadoop cluster to harness enormous volumes of online data at scale to serve our customers
US, WA, Seattle
As a SR. Marketing Analyst on the AWS Product Marketing team, your focus will be understanding the key priorities of the business and then digging into the data for our services and digital marketing channels with a goal to identify actionable insights. The ideal candidate has an understanding of marketing channels and is passionate about uncovering ways to optimize resources and processes. In this role, you’ll have an opportunity to use marketing data to identify the most impactful activities for the organization, saving time and money through your work. You’ll also own reporting to the VP of Marketing on behalf of the Product Marketing organization, and you’ll partner closely with Operations and Business Intelligence team to identify new data sources to enhance our insights and improve marketing’s performance. In this role, you’ll also work with Finance and the Data Science team to connect marketing spend to business impact to understand the value of our marketing tactics. This role requires a candidate who is satisfied operating with a high degree of autonomy and who can work collaboratively with leadership to drive rigor around our reporting. As a trustworthy source of data-backed insights, you’ll have an active role in providing recommendations to inform the organization’s marketing strategy. Lastly, this role requires someone who is an expert at prioritizing their time for the biggest impact and who is comfortable saying no when a request or effort isn’t a priority for the business.This role must sit in Seattle, WA. Relocation from within the US offered.RESPONSIBILITIES• Solving complex and ambiguous analytic projects for the broader organization• Apply statistical analysis to marketing data, developing actionable insights for stakeholders• Own reporting and insights as part of AWS Marketing’s weekly business review.• Create or streamline Tableau dashboards to monitor marketing channels and communicate performance.• Partner with stakeholders and Business Intelligence teams to ingest required data for robust analysis.• Work with stakeholders to identify new KPIs to evaluate marketing performance.• Bring statistical rigor to A/B tests and campaign measurement.• Employ data visualization to effectively communicate marketing insights.
LU, Luxembourg
Amazon's Customer Service (CS) department is seeking an experienced Data Scientist to join the team. Customer service is the heart of Amazon, our vision is to be "Earth's most customer centric company; to build a place where people can come to find and discover anything they might want to buy online."The successful candidate will be a key member of the EU Customer Service Data, Insights and Design team. The mission of this team is to analyse customer feedback, create actionable insights and improve the customer experience together with our EU business partners.As the EU CS Data Scientist, you will support the analytical strategy for EU Customer Service. As a “hands-on” data science candidate, you lead large projects to drive customer experience improvements, often in new territory, with quantifiable impact for the customer. You drive experiments and rigorous analysis to innovatively answer business questions and evaluate our service. You are able to develop solutions for complex business problems that may extend into other organizations. You anticipate and develop solutions that can address forward-looking business problems. You simplify the design and improve the accuracy of existing solutions, models, processes, and technologies.*This role requires a valid visa/work permit*Key Responsibilities:Responsibility includes but is not limited to:· Design Machine Learning solutions to support EU Customer Service strategy and customer effort reduction· Develop end to end ML project, including understanding the business need, aggregating data, exploring data, building & validating ML Models· Influence EU CS strategy and business partners by challenging assumptions and driving evidence-based decision making· Ability to convey technical solutions to non-tech business stakeholders· Evaluate cross-team perspectives and understand how interactions among teams, processes, and software systems need to be modeled in analytics solutions· Investigate the feasibility of applying scientific principles and concepts to business problems and products· Deep dive into a significant opportunity, designing the right scientific approach and delivering the solution to drive data insight or describe an end-to-end system· Help raise awareness and educate the organization on new and well-established data science techniques
IN, KA, Bangalore
Advertising at Amazon is a fast-growing business that spans across desktop, mobile and connected devices; encompasses ads on Amazon and a vast network of hundreds of thousands of third party publishers; and extends across US, EU and an increasing number of international geographies. The Ad Optimization group in Bangalore has the charter to build data-science focused products and platforms for Amazon Advertising. One of our key focus areas is Traffic Quality where we endeavor to identify non-human and invalid traffic within programmatic ad sources, and weed them out to ensure a high quality advertising marketplace. We do this by building machine learning and optimization algorithms that operate at scale, and leverage nuanced features about user, context, and creative engagement to determine the validity of traffic. The challenge is to stay one step ahead by investing in deep analytics and developing new algorithms that address emergent attack vectors in a structured and scalable fashion.Twitch is a strategic video supply source for Amazon Advertising and we need to innovate invalid traffic algorithms to counter the specific risks for Twitch, saving advertisers hundreds of millions of dollars of wasted spend.Traffic quality systems process billions of ad-impressions and clicks per day. by leveraging cutting-edge open source technologies like Hadoop, Spark, Redis and Amazon's cloud services like EC2, S3, EMR, DynamoDB and RedShift. We build and deploy complex machine learning and advanced optimization algorithms that operate at scale. We are looking for talented applied scientists who enjoy working on creative algorithms and thrive in a fast-paced, fun environment. An Applied Scientist is responsible for solving complex big-data problems in the online advertising space using data mining, machine learning, statistical analysis and computational economics. An ideal candidate should have strong depth and breadth knowledge in machine learning, data mining and statistics. The candidate should have reasonable programming and design skills to manipulate unstructured and big data and build prototypes that work on massive datasets. The candidate should be able to apply business knowledge to perform broad data analysis as a precursor to modeling and to provide valuable business intelligence.
IN, KA, Bangalore
Advertising at Amazon is a fast-growing business that spans across desktop, mobile and connected devices; encompasses ads on Amazon and a vast network of hundreds of thousands of third party publishers; and extends across US, EU and an increasing number of international geographies. The Ad Optimization group in Bangalore has the charter to build data-science focused products and platforms for Amazon Advertising. One of our key focus areas is Traffic Quality where we endeavor to identify non-human and invalid traffic within programmatic ad sources, and weed them out to ensure a high quality advertising marketplace. We do this by building machine learning and optimization algorithms that operate at scale, and leverage nuanced features about user, context, and creative engagement to determine the validity of traffic. The challenge is to stay one step ahead by investing in deep analytics and developing new algorithms that address emergent attack vectors in a structured and scalable fashion. We are committed to building a long-term traffic quality solution that encompasses all Amazon advertising channels and provides state-of-the-art traffic filtering that preserves advertiser trust and saves them hundreds of millions of dollars of wasted spend.Traffic quality systems process billions of ad-impressions and clicks per day. by leveraging cutting-edge open source technologies like Hadoop, Spark, Redis and Amazon's cloud services like EC2, S3, EMR, DynamoDB and RedShift. We build and deploy complex machine learning and advanced optimization algorithms that operate at scale. We are looking for talented applied scientists who enjoy working on creative algorithms and thrive in a fast-paced, fun environment. An Applied Scientist is responsible for solving complex big-data problems in the online advertising space using data mining, machine learning, statistical analysis and computational economics. An ideal candidate should have strong depth and breadth knowledge in machine learning, data mining and statistics. The candidate should have reasonable programming and design skills to manipulate unstructured and big data and build prototypes that work on massive datasets. The candidate should be able to apply business knowledge to perform broad data analysis as a precursor to modeling and to provide valuable business intelligence.
US, MA, Cambridge
We’re looking for a passionate, talented, and inventive Senior Applied Scientist to help build industry-leading technologies in speech translation. Our team's mission is to enable Alexa to break down language barriers for our customers.Job responsibilitiesAs a Senior Applied Scientist with the Alexa Artificial Intelligence (AI) team, you will be responsible for developing novel algorithms that advance the state-of-the-art in language processing and entity resolution, driving model and algorithmic improvements, formulating evaluation methodologies and for influencing design and architecture choices. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to build novel products and services that make use of speech and language technology. You will work in a hybrid, fast-paced organization where scientists and engineers work together and drive improvements to production. You will collaborate with and mentor other scientists to raise the bar of scientific research in Amazon.Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
US, WA, Redmond
Are you interested in building and driving the technical vision, strategy, and implementation for Kuiper’s LEO Capacity Management Services? Kuiper is hiring a Principal Data Scientist to help lead the analysis, definition and implementation of our global, highly reliable, predictive data driven services that manage the end-to-end resources of Kuiper’s Internet Service for ground and constellation networks.A day in the lifeYou will partner with Product Management, customers, RF, Networking and Beam Planning engineers to understand all capabilities and designs of the Kuiper ISP. You will drive the data driven models of bandwidth, latency and customer segment consumption to create highly reliable, time sensitive, predictive capacity management systems that drive overall monetization and customer experience.An ideal candidate will have analytical, data science, and system engineering skills to model the interdependent business and technical processes needed to operate and expand a world-wide fleet of space communication and ground assets. The candidate will use these models to enhance customer delight by meeting performance agreements, faster decisions, reduced costs, and simplified interactions.About the hiring groupThe Team is responsible for Architecture, design and delivering and end to end Networking systems for both constellation and ground, as well as the services that utilize the network to delivery last mile and and back-haul internet services. This includes extreme scale of global Software Defined Network and Capacity ManagementJob responsibilitiesWe are looking for a Principal Data Scientist on this team. You will be responsible for identifying, scoping, and delivering capacity planning solutions with a focus on Europe; based on a deep understand of your customers' needs, you will work closely with senior leaders, scientists, engineers, and business teams worldwide to develop and implement advanced mathematical and economic models and algorithms. You will identify data and science-related bottlenecks, anticipate and make trade-offs, balance business needs versus scientific and technical complexity and constraints, and guide and manage escalations, collaborating closely with multiple teams to ensure the relevance and impact of your work to business stakeholders.You will need an ability to take large, scientifically complex projects and break them down into manageable hypotheses, design meaningful research questions and analyze the resulting data to inform functional specifications, and then deliver features in a successful and timely manner. You excel at being a thought leader as we chart new courses with our capacity planning technologies, and at defining a vision for products in early stages. Maturity, high judgment, negotiation skills, and the ability to influence and earn the trust of senior leaders are essential to success in this role.Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
US, CA, Sunnyvale
Are you a passionate scientist in the area of computer vision and machine learning who is aspired to develop new and innovative technologies to new product categories? Are you interested in applying your deep knowledge to new and challenging areas? Are you looking to scale capabilities computer vision and machine learning capabilities to new workload sizes? Are you up to the task of delivering innovative and scalable technology that manages automated recognition of millions of items?You will be part of a passionate team whose missions is to push the frontier of computer vision and machine learning technology into the smart home application area. This is a great opportunity for you to innovate in this space by developing algorithms at the edge and in the cloud, and integrating them into consumer services to enable a premium customer experience. In this role, you will be an owner of the full algorithm development cycle, from sensor evaluation and data engineering to algorithm design, implementation, optimization and deployment. This position also requires experience with developing efficient software components on resource-constrained computing platforms on the edge. You will collaborate with different Amazon teams to make informed decisions on the best practices in machine learning to build highly-optimized integrated hardware and software platforms.Main Responsibilities· Apply best practices to investigate, acquire, process and analyze data sources for algorithm development.· Research and implement the state-of-the-art methods in computer vision and machine learning to deliver algorithms that meets product specifications.· Design, build algorithm evaluation frameworks, schedule and report algorithm performance on a regular basis.· Optimize and deploy algorithms on target hardware platforms.· Establish, develop and maintain frameworks and procedures for image sensor selection and evaluation and image quality monitoring.· Influence system design by making informed decisions on the selection of data sources, algorithms and sensors.Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
RO, Timisoara
The Amazon Devices team designs and engineers consumer electronics,including the best-selling Ring cameras, Kindle family of products, Firetablets, Fire TV, Amazon Dash, and Amazon Echo.As an Applied Scientist, you will participate in the design,development, and evaluation of models and machine learning (ML)technology to delight our customers. More specifically, as a member of the team, you will be involved in researching state ofthe art Computer Vision (CV) & ML solutions for Amazon devices and cloud services.You will be part of a team delivering features that are well received byour customers.
RO, Timisoara
The Amazon Devices team designs and engineers consumer electronics,including the best-selling Ring cameras, Kindle family of products, Firetablets, Fire TV, Amazon Dash, and Amazon Echo.As an Applied Scientist, you will participate in the design,development, and evaluation of models and machine learning (ML)technology to delight our customers. More specifically, as a member of the team, you will be involved in researching state ofthe art Computer Vision (CV) & ML solutions for Amazon devices and cloud services.You will be part of a team delivering features that are well received byour customers.
US, CA, San Diego
Are you excited to help customers discover the hottest and best reviewed products?The Marketing Tech and Science team helps customers discover and engage with new, popular and relevant products across Amazon worldwide. We do this by combining technology, science, and innovation to build new customer-facing features and experiences alongside cutting edge tools for marketers. You will be responsible for creating and building critical services that automatically generate, target, and optimize Amazon’s cross-category marketing and merchandising. Through the enablement of intelligent marketing campaigns that leverage machine-learning models, you will help to deliver the best possible shopping experience for Amazon’s customers all over the globe.We are looking for analytical problem solvers who enjoy diving into data, excited about data science and statistics, can multi-task, and can credibly interface between engineering teams and business stakeholders. Your analytical abilities, business understanding, and technical savvy will be used to identify specific and actionable opportunities to solve existing business problems and look around corners for future opportunities. Your domain spans the design, development, testing, and deployment of data-driven and highly scalable machine learning solutions in product recommendation.As an Applied Scientist, you bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions.To know more about Amazon science, please visit https://www.amazon.science
US, WA, Seattle
Are you excited about powering Amazon’s physical stores’ expansion through the application of Machine Learning and Big Data technologies? Do you thrive in a fast-moving, innovative environment that values data-driven decision making, scalable solutions, and sound scientific practices? We are looking for experienced scientists to build the next level of intelligence that will help Amazon physical stores grow and succeed.Our team is responsible for building the core intelligence, insights, and algorithms that support the real estate acquisition strategies for Amazon physical stores. We are tackling cutting-edge, complex problems — such as predicting the optimal location for new Amazon stores — by bringing together numerous data assets from disparate sources inside and outside of Amazon, and using best-in-class modeling solutions to extract the most information out of them.You will have a proven track-record of delivering solutions using advanced science approaches. You will be comfortable using a variety of tools and data sources to answer high-impact business questions. You will transform one-off models into automated systems. You will be able to break down complex information and insights into clear and concise language and be comfortable presenting your findings to audiences with a broad range of backgrounds.Responsibilities:· Develop production software systems utilizing advanced algorithms to solve business problems.· Analyze and validate data to ensure high data quality and reliable insights.· Partner with data engineering teams across multiple business lines to improve data assets, quality, metrics and insights.· Proactively identify interesting areas for deep dive investigations and future product development.· Design and execute experiments, and analyze experimental results in collaboration with Product Managers, Business Analysts, Economists, and other specialists.· Leverage industry best practices to establish repeatable applied science practices, principles & processes.
US, IL, Chicago
Are you excited about powering Amazon’s physical stores’ expansion through the application of Machine Learning and Big Data technologies? Do you thrive in a fast-moving, innovative environment that values data-driven decision making, scalable solutions, and sound scientific practices? We are looking for experienced scientists to build the next level of intelligence that will help Amazon physical stores grow and succeed.Our team is responsible for building the core intelligence, insights, and algorithms that support the real estate acquisition strategies for Amazon physical stores. We are tackling cutting-edge, complex problems — such as predicting the optimal location for new Amazon stores — by bringing together numerous data assets from disparate sources inside and outside of Amazon, and using best-in-class modeling solutions to extract the most information out of them.You will have a proven track-record of delivering solutions using advanced science approaches. You will be comfortable using a variety of tools and data sources to answer high-impact business questions. You will transform one-off models into automated systems. You will be able to break down complex information and insights into clear and concise language and be comfortable presenting your findings to audiences with a broad range of backgrounds.Responsibilities:· Develop production software systems utilizing advanced algorithms to solve business problems.· Analyze and validate data to ensure high data quality and reliable insights.· Partner with data engineering teams across multiple business lines to improve data assets, quality, metrics and insights.· Proactively identify interesting areas for deep dive investigations and future product development.· Design and execute experiments, and analyze experimental results in collaboration with Product Managers, Business Analysts, Economists, and other specialists.· Leverage industry best practices to establish repeatable applied science practices, principles & processes.