Building product graphs automatically

Automated system tripled the number of facts in a product graph.

Knowledge graphs are data structures that capture relationships between data in a very flexible manner. They can help make information retrieval more precise, and they can also be used to uncover previously unknown relationships in large data sets.

Manually assembling knowledge graphs is extremely time consuming, so researchers in the field have long been investigating techniques for producing them automatically. The approach has been successful for domains such as movie information, which feature relatively few types of relationships and abound in sources of structured data.

Automatically producing knowledge graphs is much more difficult in the case of retail products, where the types of relationships between data items are essentially unbounded — color for clothes, flavor for candy, wattage for electronics, and so on — and where much useful information is stored in free-form product descriptions, customer reviews, and question-and-answer forums.

AutoKnow.png
The inputs to AutoKnow include an existing product taxonomy, user logs, and a product catalogue. AutoKnow automatically combines data from all three sources into a product graph, adding new product types to the taxonomy, adding new values for product attributes, correcting errors, and identifying synonyms.
Credit: Stacy Reilly

This year, at the Association for Computing Machinery’s annual conference on Knowledge Discovery and Data Mining (KDD), my colleagues and I will present a system we call AutoKnow, a suite of techniques for automatically augmenting product knowledge graphs with both structured data and data extracted from free-form text sources.

With AutoKnow, we increased the number of facts in Amazon’s consumables product graph (which includes the categories grocery, beauty, baby, and health) by almost 200%, identifying product types with 87.7% accuracy.

We also compared each of our system’s five modules, which execute tasks such as product type extraction and anomaly detection, to existing systems and found that they improved performance across the board, often quite dramatically (an improvement of more than 300% in the case of product type extraction).

The AutoKnow framework

Knowledge graphs typically consist of entities — the nodes of the graph, often depicted as circles — and relations between the entities — usually depicted as line segments connecting nodes. The entity “drink”, for example, might be related to the entity “coffee” by the relationship “contains”. The entity “bag of coffee” might be related to the entity “16 ounces” by the relationship “has_volume”.

In a narrow domain such as movie information, the number of entity types — such as director, actor, and editor — is limited, as are the number of relationships — directed, performed in, edited, and so on. Moreover, movie sources often provide structured data, explicitly listing cast and crew.

In a retail domain, on the other hand, the number of product types tends to grow as the graph expands. Each product type has its own set of attributes, which may be entirely different from the next product type’s — color and texture, for instance, versus battery type and effective range. And the vital information about a product — that a coffee mug gets too hot to hold, for instance — could be buried in the free-form text of a review or question-and-answer section.

AutoKnow addresses these challenges with five machine-learning-based processing modules, each of which builds on the outputs of the one that precedes it:

  1. Taxonomy enrichment extends the number of entity types in the graph;
  2. Relation discovery identifies attributes of products, those attributes’ range of possible values (different flavors or colors, for instance), and, crucially, which of those attributes are important to customers;
  3. Data imputation uses the entity types and relations discovered by the previous modules to determine whether free-form text associated with products contains any information missing from the graph;
  4. Data cleaning sorts through existing and newly extracted data to see whether any of it was misclassified in the source texts; and
  5. Synonym finding attempts to identify entity types and attribute values that have the same meaning.

The ontology suite

The inputs to AutoKnow include an existing product graph; a catalogue of products that includes some structured information, such as labeled product names, and unstructured product descriptions; free-form product-related information, such as customer reviews and sets of product-related questions and answers; and product query data.

To identify new products, the taxonomy enrichment module uses a machine learning model that labels substrings of the product titles in the source catalogue. For instance, in the product title “Ben & Jerry’s black cherry cheesecake ice cream”, the model would label the substring “ice cream” as the product type.

The same model also labels substrings that indicate product attributes, for use during the relation discovery step. In this case, for instance, it would label “black cherry cheesecake” as the flavor attribute. The model is trained on product descriptions whose product types and attributes have already been classified according to a hand-engineered taxonomy.

Next, the taxonomy enrichment module classifies the newly extracted product types according to their hypernyms, or the broader product categories that they fall under. Ice cream, for instance, falls under the hypernym “Ice cream and novelties”, which falls under the hypernym “Frozen”, and so on.

The hypernym classifier uses data about customer interactions, such as which products customers viewed or purchased after a single query. Again, the machine learning model is trained on product data labeled according to an existing taxonomy.

Relation discovery

The relation discovery module classifies product attributes according to two criteria. The first is whether the attribute applies to a given product. The attribute flavor, for instance, applies to food but not to clothes.

The second criterion is how important the attribute is to buyers of a particular product. Brand name, it turns out, is more important to buyers of snack foods than to buyers of produce.

Both classifiers analyze data provided by providers — product descriptions — and by customers — reviews and Q&As. With both types of input data, the classifiers consider the frequency with which attribute words occur in texts associated with a given product; with the provider data, they also consider how frequently a given word occurs across instances of a particular product type.

The models were trained on data that had been annotated to indicate whether particular attributes applied to the associated products.

The data suite

Step three, data imputation, looks for terms in product descriptions that may fit the new product and attribute categories identified in the previous steps, but which have not yet been added to the graph.

This step uses embeddings, which represent descriptive terms as points in a vector space, where related terms are grouped together. The idea is that, if a number of terms clustered together in the space share the same attribute or product type, the unlabeled terms in the same cluster should, too.

Previously, my Amazon colleagues and I, together with colleagues at the University of Utah, demonstrated state-of-the-art data imputation results by training a sequence-tagging model, much like the one I described above, which labeled “black cherry cheesecake” as a flavor.

Here, however, we vary that approach by conditioning the sequence-tagging model on the product type: that is, the tagged sequence output by the model depends on the product type, whose embedding we include among the inputs.

Cleaning module.png
The architecture of the AutoKnow cleaning module.

The next step is data cleaning, which uses a machine learning model based on the Transformer architecture. The inputs to the model are a textual product description, an attribute (flavor, volume, color, etc.), and a value for that attribute (chocolate, 16 ounces, blue, etc.). Based on the product description, the model decides whether the attribute value is misassigned.

To train the model, we collect valid attribute-value pairs that occur across many instances of a single product type (all ice cream types, for instance, have flavors); these constitute the positive examples. We also generate negative examples by replacing the values in valid attribute-value pairs with mismatched values.

Finally, we analyze our product and attribute sets to find synonyms that should be combined in a single node of the product graph. First, we use customer interaction data to identify items that were viewed during the same queries; their product and attribute descriptions are candidate synonyms.

Then we use a combination of techniques to filter the candidate terms. These include edit distance (a measure of the similarity of two strings of characters) and a neural network. In tests, this approach yielded a respectable .83 area under the precision-recall curve.

In ongoing work, we’re addressing a number of outstanding questions, such as how to handle products with multiple hypernyms (products that have multiple “parents” in the product hierarchy), cleaning data before it’s used to train our models, and using image data as well as textual data to improve our models’ performance.

Watch a video presentation of the AutoKnow paper from Jun Ma, senior applied scientist.

AutoKnow: Self-driving knowledge collection for products of thousands of types | Amazon Science

About the Author
Xin Luna Dong is a principal scientist in the Amazon Product Graph group.

Recommended for you

Work with us

See More Jobs
RO, Iasi
Location: Romania (Lasi & Bucharest)Duration: 4-6 monthsAmazon is a company of builders. A philosophy of ownership carries through everything we do — from the proprietary technologies we create to the new businesses we launch and grow. You’ll find it in every team across our company; from providing Earth’s biggest selection of products to developing ground-breaking software and devices that change entire industries, Amazon embraces invention and progressive thinking. Amazon is continually evolving; it’s a place where motivated employees thrive, and ownership and accountability lead to meaningful results. It’s as simple as this: we pioneer.With every order made and parcel delivered, customer demand at Amazon is growing. And to meet this demand, and keep our world-class service running smoothly, we're growing our teams across Europe. Delivering hundreds of thousands of products to hundreds of countries worldwide, our Operations teams possess a wide range of skills and experience and this include software developers, data engineers, operations research scientists, and more.About these internships:Whatever your background, if you are excited about modeling huge amounts of data and creating state of the art algorithms to solve real world problems, if you have a passion for using machine learning/mathematical optimization to design optimal or near optimal solution methodologies to be used by in-house decision support tools and software, if you enjoy solving operational challenges by using computer simulations, and if you’re motivated by results and driven enough to achieve them, Amazon is a great place to be. Because it’s only by coming up with new ideas and challenging the status quo that we can continue to be the most customer-centric company on Earth, we’re all about flexibility: we expect you to adapt to changes quickly and we encourage you to try new things.Amazon is looking for ambitious and enthusiastic students to join our unique world as interns. An Amazon EU internship will provide you with an unforgettable experience in a fast-paced, dynamic and international environment; it will boost your resume and will provide a superb introduction to our activities.As an intern in Ops Research and modelling, you could join one of the following teams: Supply Chain, Transportation, HR, Employee Relations and more.You will put your analytical and technical skills to the test and roll up your sleeves to complete a project that will contribute to improve the functionality and level of service that teams provides to our customers. This could include:· Analyze and solve business problems at their root, stepping back to understand the broader context· Apply advanced statistics /data mining/ operations research techniques to analyze and make insights from big data (data sets could include: historical production data, volumes, transportation and logistics metrics, simulation/experiment results etc.) across multiple geographies.· Closely collaborate with operations research scientists, data scientists, business analysts, BI teams, developers, economists and more on various models’ (including predictive/prescriptive models) development.· Perform quantitative, economic, and/or numerical analysis of the performance of supply chain systems under uncertainty using statistical and optimization tools to find both exact and heuristic solution strategies for optimization problems.· Create computer simulations to support operational decision-making. Identify areas with potential for improvement and work with internal teams to generate requirements that can realize these improvements.· Create software prototypes to verify and validate the devised solutions methodologies; integrate the prototypes into production systems using standard software development tools and methodologies.· Convert statistical output into detailed documents which influence business actionsAmazon is an equal opportunities employer. We believe passionately that employing a diverse workforce is central to our success. We make recruiting decision based on your experience and skills. We value your passion to discover, invent, simplify and build. We welcome applications from all members of society irrespective of age, sex, disability, sexual orientation, race, religion or belief.By submitting your resume and application information, you authorize Amazon to transmit and store your information in the Amazon group of companies' world-wide recruitment database, and to circulate that information as necessary for the purpose of evaluating your qualifications for this or other job vacancies.#EUInternHiring
AU, SA, ADELAIDE
At Amazon Australia, we are developing state-of-the-art large-scale Machine Learning Services and Applications on the Cloud involving Terabytes of data. We work on applying predictive technology to a wide spectrum of problems in areas such as Amazon Retail, Seller Services, Customer Service and so on. We are looking for talented and experienced Machine Learning Scientists (Ph.D. in a related area preferred) who can apply innovative Machine Learning techniques to real-world e-Commerce problems. You will get to work in a team dedicated to advancing Machine Learning technology at Amazon and converting it to business-impacting solutions.Although this position will be based in Adelaide, South Australia, for the duration of the Coronavirus-19 outbreak arrangements will be made to enable the successful candidate to observe the relevant travel restrictions, possibly by working from home, or another Amazon office.Major responsibilities· Use machine learning, data mining and statistical techniques to create new, scalable solutions for business problems· Analyze and extract relevant information from large amounts of Amazon’s historical business data to help automate and optimize key processes· Design, develop and evaluate highly innovative models for predictive learning· Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation· Research and implement novel machine learning and statistical approaches
US, WA, Seattle
Data Scientist IAmazon Web Services (AWS) provides companies of all sizes with an infrastructure web services platform in the cloud (“cloud computing”). With AWS you can requisition compute power, storage, and many other services – gaining access to a suite of elastic IT infrastructure services as your demands require them. AWS is the leading platform for designing and developing applications for the cloud and is growing rapidly with hundreds of thousands of companies in over 190 countries on the platform.The AWS Sales Analytics team uses machine learning, econometrics, and data science to optimize AWS’ sales strategy across various global programs, driving customer engagement, and developing insights across a global organization. We use detailed customer behavior and AWS usage data to predict and understand our customers and their needs and wants from AWS.The AWS Sales Analytics team is looking for a Data Scientist to join our predictive analytics team to work closely with a team of B.I. Engineers and Analysts to develop and automate data models by creating various tools and machine-learning models to answer complex questions and provide insights. The Data Scientist must be comfortable working with large volumes of data and be able to manipulate data using a variety of tools. They must be able to build scalable tools to not only process the data but transform it into actionable information. In addition, the ideal candidate must possess excellent interpersonal skills, strong written communication skills, and be able to provide thought leadership and guidance to the other members across the team.Role Summary:Key Responsibilities include:· Organize and structure large data sets· Work closely with the teams to define the performance indicators· Determine how the key metrics and performance indicators will be calculated, owned, and managed.· Collaborate with colleagues across AWS Sales, Data Scientists, Economists, and the Worldwide Revenue Operations team to build and manage data infrastructure and models in support of predictive research and other scientific experiments· Dive deep to understand data sets and ensure statistical accuracy to support conclusions· Analyze data and present information to support decision making
US, MA, Cambridge
A part of Amazon's Robotics Artificial Intelligence, Canvas Technology is using spatial AI to provide end-to-end autonomous delivery of goods. By using state-of-the-art cameras and other sensors, the system perceives its surroundings with unrivaled vision and fidelity. The system combines a mix of high-performance sensors with simultaneous localization and mapping software that builds and continuously updates maps in real-time, completely automatically. It has the capability to ‘see’ and identify different objects, people, vehicles, and places as it moves and react to moving people and vehicles in an intelligent way.We are seeking an experienced Manager of Applied Science to help guide and lead our team of motion planning scientists and engineers. In this role, you will be expected to help define a team direction for robot planning algorithms, object avoidance and congestion management strategies. This will include providing guidance on system architecture and algorithm selection or design. This is not solely a management role, you will be directly contributing to the implementation while you lead. A successful candidate will have strong technical ability, scientific vision, excellent project management skills, great communication skills, and a motivation to achieve results in a fast paced environment.You will be an integral part of the core robotics team and work with others to implement robot motion planning systems above and beyond the current state-of-the-art in the field.If you are an experienced Applied Science Manager, have a track record of delivering to timelines with high quality, are comfortable providing technical direction, have mentored and grown engineers and are deeply technical and innovative, we want to talk to you.
US, WA, Seattle
Not many teams aspire to zero. Welcome to the Worldwide Returns & ReCommerce team (WWR&R) at Amazon.com.WW R&R is an agile, innovative organization dedicated to ‘making zero happen’ to benefit our customers, our company, and the environment. Our goal is to achieve the three zeroes: zero cost of returns, zero waste, and zero defects. We do this by developing groundbreaking products and driving operational excellence to help customers keep what they buy, recover returned and damaged product value, keep thousands of tons of waste from landfills, and create the best customer returns experience in the world. We have an eye to the future – we create long-term value at Amazon by focusing not just on the bottom line, but on the planet. We are building the most sustainable re-use channel we can by driving multiple aspects of the Circular Economy for Amazon - returns, recommerce, and rentals.Amazon WW R&R is comprised of business, product, operational, program, software engineering and data teams that manage the life of a returned or damaged product from a customer to the warehouse and on to its next best use. Our work is broad and deep: we train machine learning models to automate routing and find signals to optimize re-use; we invent new channels to give products a second life; we develop innovative product support to help customers love what they buy; we pilot smarter product evaluations; we work from the customer backward to find ways to make the return experience remarkably delightful and easy; and we do it all while scrutinizing our business with laser focus.In this role, you would- Use machine learning and analytical techniques to create scalable solutions for business problems.- Analyze and extract relevant information from large amounts of historical business data to help automate and optimize key processes.- Design, development and evaluation of highly innovative models for predictive learning.- Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation.- Research and implement novel machine learning and statistical approaches.- Work closely with data & software engineering teams to build model implementations and integrate successful models and algorithms in production systems at very large scale.- Technically lead and mentor other scientists in team.We are a group that has fun at work while driving incredible customer, business, and environmental impact. We are backed by a strong leadership group dedicated to operational excellence that empowers a reasonable work-life balance. As an established, experienced team, we offer the scope and support needed for substantial career growth.Amazon is earth’s most customer-centric company and through WW R&R, the earth is our customer too. Come join us and innovate with the Amazon Worldwide Returns & ReCommerce team!
UK, BST, Bristol
Excited by using massive amounts of data to create sophisticated security analytics? Does pushing the envelope with user and entity behavior analytics (UEBA) including developing statistical, stochastic and machine learning models for security excite you? Want to help the largest global enterprises operate safely & securely in the cloud at an unprecedented scale?You will enhance real-time proactive and preemptive systems using data driven techniques. The analytic techniques are applied at each stage from detection to response mitigation. This job will challenge you to dive deep and understand the unique challenges in operating this platform at a scale unrivaled in the industry. Our scale provides unique opportunities that simply don’t exist elsewhere, but these opportunities can only be revealed by a scientific thinker with a curious mind, who is committed to improving every single day.This is an opportunity to operate in a truly groundbreaking manner given the sheer scale, breadth, and fast pace of the AWS environment. Data scientists are collaborative and highly determined, working backward from threats to create new and innovative ways to detect, assess and react to malicious activity.Your job is distilling meaningful insights from large volumes of data to address the constantly evolving threat and attack space from sophisticated adversaries. This role will give you an opportunity to use and grow a broad range of skills in data science, analytic development, and information security – all using and creating cloud services in large scale computing environments. Considering the scale and the magnitude of the technical challenge, this role is a great opportunity to make a meaningful contribution to an extremely important area. Operating in the cloud at our scale enables activities that have never before been possible, providing new advantages and opportunities for innovative work.We are hiring technical specialists at the convergence of the four hottest areas in tech: Data Science, Security, Software Development and Cloud Services. Worried that you do not have hands-on experience in all four areas? We are looking for solid base and expertise in two areas - you will grow expertise in the remaining areas to round out your background in all four key areas. Alongside your team, you will work directly with the biggest internal and external Amazon customers – leveraging data and analytics to create innovative ways to work securely in the cloud.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where people can find and discover anything they want to buy online. We hire the world's brightest minds, offering them a fast paced, technologically and friendly work environment. Amazon’s Digital Content and Commerce Services team (DCCS) powers ordering, subscriptions and vendor payments for all Amazon’s digital and subscription businesses including Prime, Amazon Video, Music, Alexa and Kindle. We process billions of digital transactions every year and enable our digital businesses to grow their businesses worldwide. We are seeking an experienced, results-driven economist to join our team and help build our next generation of digital solutions and offerings to delight Amazon customers.DCCS owns key customer facing experiences including content selection, subscriptions, pricing, availability, customer experience etc. You will build econometric models using our world class data systems and apply economic theory to solve business problems in a fast moving environment. Economists at Amazon will be expected to develop new techniques to process large data sets, address quantitative problems, and contribute to design of automated systems around the company.Economists will work closely with other research scientists, machine learning experts, and economists globally to design new frameworks that systematically identify low touch machine driven recommendations that propel customer growth while creating a meaningful economic impact for Amazon. Research science at Amazon is a highly experimental activity, although theoretical analysis and innovation are also welcome. Our economists and scientists work closely with software engineers to put algorithms into practice. They also work on cross-disciplinary efforts with other scientists within Amazon.The key strategic objectives for this role include:-Provide data-driven guidance and recommendations on strategic initiatives- Identify opportunities/tests that drive product discovery, new customer acquisition to drive growth.-Conduct, direct, and coordinate all phases of research projects, demonstrating skill in all stages of the analysis process, including defining key research questions, recommending measures, working with multiple data sources, evaluating methodology and design, executing analysis plans, interpreting and communicating results.
LU, Luxembourg
Are you a talented and inventive scientist with strong passion about Artificial Intelligence and Predictive Modeling? Would you like to develop Machine-Learning models by playing a key role within EU RME Predictive Analytics team? Our mission is to drive the Predictive Maintenance (PdM) and Spare Parts (SP) programs for Amazon EU Operations that consists of complex automation, sortation, robotic and materials handling systems.As Machine Learning Scientist you will be working with large distributed systems of data and providing predictive maintenance expertise for over 2000 maintenance engineers, managers and administrators by supporting the entire network managed by EU RME, which may include non-EU locations (such as Singapore, Australia and Japan). You will connect with world leaders in your field and you will be tackling ML challenges by carrying out a systematic literature review of Machine Learning methods applied to PdM. The appropriate choice of the ML methods and their implementation will be the key for the success of the PdM and Spare Parts programs.This role requires an individual with strong skills in the area of data science, Machine Learning and statistics. The successful candidate will be a self-starter comfortable with ambiguity, with strong attention to detail and outstanding ability in balancing technical leadership with strong business judgment to make the right decisions about model and method choices.Key Areas of Responsibilities:· Provide technical expertise to support team strategies that will take EU RME towards World Class predictive maintenance practices and processes, driving better equipment up-time and lower repair costs with optimized spare parts inventory and placement· Implement an advanced maintenance framework utilizing Machine Learning technologies to drive equipment performance leading to reduced unplanned downtime· Provide technical expertise to support the development of long-term spares management strategies that will ensure spares availability at an optimal level for local sites and reduce the cost of spares· Facilitate the access to data and tools for the larger Reliability Engineering team to drive reliability insights
CA, ON
The Economic Technology team (ET) is looking for a Data Scientist to join our team in building Reinforcement Learning solutions at scale. The ET applies Machine Learning, Reinforcement Learning, Causal Inference, and Econometrics/Economics to derive actionable insights about the complex economy of Amazon’s retail business. We also develop Statistical Models and Algorithms to drive strategic business decisions and improve operations. We are an interdisciplinary team of Economists, Engineers, and Scientists incubating and building day one solutions using cutting-edge technology, to solve some of the toughest business problems at Amazon.You will work with business leaders, scientists, and economists to translate business and functional requirements into concrete deliverables, including the design, development, testing, and deployment of highly scalable distributed services. You will partner with scientists, economists, and engineers to help invent and implement scalable ML, RL, and econometric models while building tools to help our customers gain and apply insights.This is a unique, high visibility opportunity for someone who wants to have business impact, dive deep into large-scale economic problems, enable measurable actions on the Consumer economy, and work closely with scientists and economists. We are particularly interested in candidates with experience building predictive models and working with distributed systems.As a Data Scientist, you bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions.
US, CA, Palo Alto
Amazon is investing heavily in building a world class advertising business and we are responsible for defining and delivering a collection of self-service performance advertising products that drive discovery and sales. Our products are strategically important to our Retail and Marketplace businesses driving long term growth. We deliver billions of ad impressions and millions of clicks daily and are breaking fresh ground to create world-class products. We are highly motivated, collaborative and fun-loving with an entrepreneurial spirit and bias for action. With a broad mandate to experiment and innovate, we are growing at an unprecedented rate with a seemingly endless range of new opportunities.Sponsored Products helps merchants, retail vendors, and brand owners succeed via native advertising that grows incremental sales of their products sold through Amazon. The Sponsored Products Ad Marketplace organization optimizes the systems and ad placements to match advertiser demand with publisher supply using a combination of machine learning, big data analytics, ultra-low latency high-volume engineering systems, and quantitative product focus. Our goals are to help buyers discover new products they love, be the most efficient way for advertisers to meet their business objectives, and to build a major, sustainable business that helps Amazon continuously innovate on behalf of all customers.We are seeking an Applied Science Manager who has a solid background in applied Machine Learning and AI, deep passion for building data-driven products; ability to communicate data insights and scientific vision, and has a proven track record of leading both applied scientists and software engineers to execute complex projects and deliver business impacts.In this role, you will:· Lead a group of both applied scientists and software engineers to deliver machine-learning and AI solutions to production.· Advance team's engineering craftsmanship and drive continued scientific innovation as a thought leader and practitioner.· Develop science and engineering roadmap, run Sprint/quarter and annual planning, and foster cross-team collaboration to execute complex projects.· Perform hands-on data analysis, build machine-learning models, run regular A/B tests, and communicate the impact to senior management.· Hire and develop top talents, provide technical and career development guidance to both scientists and engineers in the organization.
US, WA, Seattle
Do you want to join Alexa AI -- the science team behind Amazon’s intelligence voice assistance system? Do you want to utilize cutting-edge deep-learning and machine learning algorithms to delight millions of Alexa users around the world?If your answers to these questions are “yes”, then come join us at the Alexa Artificial Intelligence team, which is in charge of improving Alexa user satisfaction through real-time metrics monitoring and continuous closed-loop learning. The team owns the modules that reduce user perceived defects and frictions through utterance reformulation, contextual and personalized hypothesis ranking.With the Alexa Artificial Intelligence team, you will be working alongside a team of experienced machine/deep learning scientists and engineers to create data driven machine learning models and solutions on tasks such as sequence-to-sequence query reformulation, graph feature embedding, personalized ranking, etc..You will be expected to:· Analyze, understand, and model user-behavior and the user-experience based on large scale data, to detect key factors causing satisfaction and dissatisfaction (SAT/DSAT).· Build and measure novel online & offline metrics for personal digital assistants and user scenarios, on diverse devices and endpoints· Create and innovate deep learning and/or machine learning based algorithms for utterance reformulation and contextual hypothesis ranking to reduce user dissatisfaction in various scenarios;· Perform model/data analysis and monitor user-experienced based metrics through online A/B testing;· Research and implement novel machine learning and deep learning algorithms and models.
US, WA, Seattle
Are you passionate about the use of Machine Learning to improve the experience for Alexa and Smart Home customers? We have a team that is making revolutionary leaps forward in this space and are in need of a Machine Learning Scientist. You will have an enormous opportunity to impact the customer experience, design, architecture, and implementation of a new machine-learning driven product that will impact the lives of people you know every day.Great candidates for this position will have a passion for machine learning and signal processing and will have hands-on experience with product development. You will have the deep expertise to drive the ML vision for our products and technical breadth to make the right decisions about technology, models and methodology choices.As a Machine Learning Scientist at Amazon, you will connect with world leaders in your field working on similar problems. On this team you will analyze and model sensor data, smart home signals, and contextual data to create new experiences for customers. Meeting business requirements will involve combining several different machine learning algorithms with domain knowledge into sophisticated ML workflows. You will work with large distributed systems of data and will tackle Machine Learning challenges in Supervised, Unsupervised, and One-shot Learning, utilizing modern methods such as Deep Neural Networks and others. MLS’s have contributed to and are aware of the state-of-the-art in their respective field of expertise and are constantly focused on advancing the state-of-the-art for improving Amazon’s products and services.KEY RESPONSIBILITIES· Analyze and extract relevant information from large amounts of data to support new experiences for customers· Create novel ML approaches and apply them to achieve project goals· Build ML software and algorithms that cost-effectively scale to millions of customers· Work closely with other teams across Amazon to deliver platform features that require cross-team leadership· Ensure that the quality and timeliness of ML deliverables
IN, KA, Bangalore
What would you do if you had access to the world’s largest product catalog with billions of products, offers, images, reviews, searches, and much more? Amazon Selection and Catalog Systems is looking for an innovative and customer-focused applied scientist to improve the data quality of the world’s biggest product catalog, utilizing state-of-the-art machine learning techniques.An information-rich and accurate product catalog is a critical strategic asset for Amazon. It powers unrivaled product discovery, informs customers’ buying decisions, offers a large selection and positions Amazon as the first stop for our customers. Maintaining and improving the accuracy of product catalog is challenging due to sheer scale (billions of products in the catalog), diversity (products ranging from electronics to groceries to instant video across multiple languages) and multitude of input sources (millions of sellers contributing product data with different quality).You will conceive innovative solutions to measure and improve the quality of various aspects of our product catalog and influence the way millions of our customers discover and buy our products worldwide. The opportunity (puzzle to solve) is that there is no single solution as the problem scope is varied and diverse. The solutions you build will vary from simple rule based systems to machine learning, semantic analysis and text processing. You will have the opportunity to design new data analytical workflows at a scale rarely available elsewhere, utilizing state-of-the-art data science and machine learning tools such as Spark, Python, and Theano and Amazon’s cloud computing technologies such as Elastic Map Reduce (EMR), Kinesis, and Redshift. You will apply your knowledge in data science by creating algorithmic solutions that combine techniques such as clustering, pattern mining, predictive modeling, deep learning, statistical testing, information retrieval, and natural language processing and apply them to the voluminous data describing the products in the catalog and the customer interactions. You will evaluate with scientific rigor and provide inputs to business strategy and technical direction. You will collaborate with software engineering teams to integrate your algorithmic solutions into large-scale highly complex Amazon production systems.Responsibilities include:· Map business requirements and customer needs to a scientific problem.· Align the research direction to business requirements and make the right judgments on research/development schedule and prioritization.· Research, design, implement and deploy scalable machine learned models, including the application of state-of-art deep learning, to solve problems that matter to our customers in an iterative fashion.· Mentor and develop junior applied scientists and developers who work on data science problems in the same organization.· Stay informed on the latest machine learning, natural language and/or artificial intelligence trends and make presentations to the larger engineering and applied science communities.
IN, KA, Bangalore
What would you do if you had access to the world’s largest product catalog with billions of products, offers, images, reviews, searches, and much more? Amazon Selection and Catalog Systems is looking for an innovative and customer-focused applied scientist to improve the data quality of the world’s biggest product catalog, utilizing state-of-the-art machine learning techniques.An information-rich and accurate product catalog is a critical strategic asset for Amazon. It powers unrivaled product discovery, informs customers’ buying decisions, offers a large selection and positions Amazon as the first stop for our customers. Maintaining and improving the accuracy of product catalog is challenging due to sheer scale (billions of products in the catalog), diversity (products ranging from electronics to groceries to instant video across multiple languages) and multitude of input sources (millions of sellers contributing product data with different quality).You will conceive innovative solutions to measure and improve the quality of various aspects of our product catalog and influence the way millions of our customers discover and buy our products worldwide. The opportunity (puzzle to solve) is that there is no single solution as the problem scope is varied and diverse. The solutions you build will vary from simple rule based systems to machine learning, semantic analysis and text processing. You will have the opportunity to design new data analytical workflows at a scale rarely available elsewhere, utilizing state-of-the-art data science and machine learning tools such as Spark, Python, and Theano and Amazon’s cloud computing technologies such as Elastic Map Reduce (EMR), Kinesis, and Redshift. You will apply your knowledge in data science by creating algorithmic solutions that combine techniques such as clustering, pattern mining, predictive modeling, deep learning, statistical testing, information retrieval, and natural language processing and apply them to the voluminous data describing the products in the catalog and the customer interactions. You will evaluate with scientific rigor and provide inputs to business strategy and technical direction. You will collaborate with software engineering teams to integrate your algorithmic solutions into large-scale highly complex Amazon production systems.Responsibilities include:· Map business requirements and customer needs to a scientific problem.· Align the research direction to business requirements and make the right judgments on research/development schedule and prioritization.· Research, design, implement and deploy scalable machine learned models, including the application of state-of-art deep learning, to solve problems that matter to our customers in an iterative fashion.· Mentor and develop junior applied scientists and developers who work on data science problems in the same organization.· Stay informed on the latest machine learning, natural language and/or artificial intelligence trends and make presentations to the larger engineering and applied science communities.
US, WA, Seattle
The Central AWS Econ team is dedicated to bringing the most trustworthy evidence-based analysis to the most strategic decisions for AWS leadership.Our studies impact strategic investments, service business model, resource allocation, product priorities and pricing models, go-to-market motions and more.This Senior economist role partners with AWS business leaders across the organization to define and deliver on economic questions that guide their most strategic decisions. The successful candidate will be a problem solver who enjoys diving into data, is excited by difficult modeling challenges and ambiguous starting points, and possesses strong communication skills to effectively interface and collaborate with product, finance, planning and business teams.Specific questions include developing supporting economics for new business model, evaluating the relationship between short and long term growth, mapping and affecting the customer journey through different AWS products and cloud technologies.The Central AWS Econ team is dedicated to answering these (and many more) questions using quantitative, economic and statistical methods.Key Responsibilities:· Identify and propose impactful economic studies based on business meetings· Lead / conduct economic studies, including developing and communicating practical implications to senior leadership· Mentor and develop junior economists and data scientists· Develop new repeatable data analysis pipelines to be used by non-economists
US, CA, Cupertino
Are you a biochemistry research scientist? At Amazon, we are constantly inventing and re-inventing to be the most customer-centric company in the world. To get there, we need exceptionally talented, bright, and driven people. We are a smart team of doers that work passionately to apply cutting-edge advances in technology and to solve real-world problems that will transform our customers’ experiences in ways we can’t even imagine yet.As a Research Scientist, you will be working with a unique and gifted team that is developing exciting products and collaborating with cross-functional teams.Responsibilities:· Collaborate to define product specifications and protocols· Iterate through experimentation to identify optimal product parameters· Identify and qualify new materials· Ensure manufacturability across the design process· Contribute to design control and regulated protocols· Collaborate with engineering teams to design, implement, and harmonize solutions
US, WA, Seattle
** LOCATION CAN BE EITHER TEMPE, SEATTLE OR NASHVILLE**Amazon Transportation Services (ATS) Line Haul is looking for a talented Data Scientist who will own analytics and develop solutions to drive insights and optimization for Network Planning and Forecasting. As a member of this team, you will have an opportunity to be an innovator in Amazon Logistics and work with a group of talented program managers, product managers, research scientists, software developers, and business stakeholders to design the Amazon network of the future.This position requires innovative thinking, ability to quickly approach large ambiguous problems, technical and engineering expertise to rapidly research, validate, visualize, prototype and deliver solutions. This position also requires significant cross functional work and integration with transportation, tech, operations and finance. Successful candidates will thrive in a fast-paced environment.As you further your career as a Data Scientist at Amazon, you will focus on improving corporate reporting frameworks and data visualization. You will examine performance data, discover and solve real world problems related to forecasting, and build critical metrics and business cases. We are focused on your success and want to build and support strong pioneers within Amazon Transportation Services. You can expect to leverage your problem solving skills and have full ownership of the projects you work on.Responsibilities:· Perform complex data research to identify opportunities to reduce fulfillment costs as well as improve efficiencies and customer experience· Design, develop and establish KPIs to provide strategic insights to drive growth and performance· Ability to perform/own reoccurring and ad-hoc business intelligence projects, including the development of advanced statistical models that improve the forecasting capabilities of the Amazon transportation network· Develop standardized metrics to evaluate and benchmark pertaining to short and long term network planning and forecasting· Communicate complex insights to stakeholders, both verbally and in writing
US, NY, New York
Amazon Web Services is looking for world class scientists to join the research team within AWS Security Services. You would be entrusted with researching and developing core data mining and machine learning algorithms for various AWS security services like GuardDuty (https://aws.amazon.com/guardduty/) and Macie (https://aws.amazon.com/macie/). On this team, you will invent and implement innovative solutions for never-before-solved problems. If you have passion for security and experience with large scale machine learning problems, this will be an exciting opportunity.The AWS Security Services team builds technologies that help customers strengthen their security posture and better meet security requirements in the AWS Cloud. The team interacts with security researchers to codify our own learnings and best practices and make them available for customers. We are building massively scalable and globally distributed security systems to power next generation services.Key Responsibilities:· Rapidly design, prototype and test many possible hypotheses in a high-ambiguity environment, making use of both quantitative and business judgment· Collaborate with software engineering teams to integrate successful experiments into large scale, highly complex production services.· Report results in a scientifically rigorous way· Interact with security engineers and related domain experts to dive deep into the types of challenges that we need innovative solutions for
US, WA, Seattle
Amazon delights millions of customers around the world. Meet PI-Squared, the behind-the-scenes team, that enables our HR and Operations Leaders to make informed decisions and improve the overall experience of a million frontline employees and leaders throughout their journey at Amazon. Our diverse team of statisticians, machine learning experts, and social scientists strive to make Amazon HR the most scientific HR organization in the world. We form hypotheses about the best talent acquisition, talent retention, and talent development techniques, and then set out to prove or disprove them with experiments and careful data collection.The ambition of Amazon HR is to be the most scientific organization in the world. We bring data and machine learning into management science to deliver workforce, associate experience, and leadership insights so Amazon leaders can focus their efforts in ways that will engage, retain and grow their talents. You will have the opportunity to work with operation leaders across different business lines to gain deep insights into Amazons’ daily operation and directly impact productivity, quality, and safety of hundreds of thousands of employees’ everyday life.Roles and Responsibilities:(1) Undertake econometric / statistical analysis to measure impact of various initiatives in the HR space.(2) Design and measure experiments(3) Undertake qualitative analysis to augment the findings from quantitative studies(4) Build scalable analytic solutions using state of the art tools based on large datasetsThis role requires an individual with strong quantitative modeling skills and the ability to apply statistical/machine learning, econometric, and experimental design methods. Preference will be given to candidates with additional experience in qualitative analysis in a variety of settings such as focus groups, field studies, surveys, and observational studies.
US, WA, Seattle
Try Before You Buy (TBYB) team at Amazon Fashion is looking for an Applied Scientist to join us to build our next-generation personalized recommendation systems for Personal Shopper and Prime Wardrobe. In this role, you will be responsible for researching, developing, and deploying machine learning, computer vision, and NLP models to make customers' fashion shopping experience at Amazon engaging and joyful.The primary responsibilities of this role include:· · Build ETL pipelines to collect and process data· · Frame and transform ambiguous business challenges into science hypotheses. Design and implement offline and online experiments to evaluate them· · Develop prototypes to test new concepts/proposals for models and algorithms· · Design and build automated, scalable pipelines to train and deploy ML models