Video classifiers learn to recognize actions they've never seen

New end-to-end approach to zero-shot video classification dramatically outperforms predecessors.

Zero-shot learning is a way to train deep-learning models to generalize to categories they’ve never seen before. The way it’s typically done, the model learns to map inputs — say, videos — to a semantic space, where words are clustered according to their meanings. If all goes well, the model can classify videos it wasn’t trained on, by mapping them to the semantic space and picking the closest word. The technique has great promise for cases where the exact classes of interest are not available during training.

Embedding space.png
Zero-shot-learning systems map inputs — in this case, videos — to a semantic space, where words are clustered according to meaning. This image shows the video whose mapping is closest to each of six words. The words are labels from both training data (“windsurfing”, “snowboarding”, and so on) and classes of video that were removed from the training data (green border). The model is able to map a video of a kayak (blue border) close to the unseen label “kayaking”.

Research on zero-shot image recognition has seen great successes with end-to-end training, in which a single deep-learning model is trained to map raw inputs directly to outputs. But to our knowledge, this approach has never been applied to the related problem of video classification.

Instead, zero-shot video classifiers typically start with a standard video classifier — one trained to recognize only a limited set of actions — and pass its outputs through a number of special-purpose subsidiary networks that learn to map them to a semantic space. This has been seen as a necessary concession to the computational complexity of processing video.

In a paper that we’re presenting (virtually) at the IEEE Conference on Computer Vision and Pattern Recognition, we apply end-to-end training to the problem of zero-shot video classification and find that it outperforms previous methods by a large margin.

When we compare our network to predecessors of the same capacity and depth, we find that, with around 500,000 training examples, it reduces the error rate of its best-performing predecessor by 29%.

Architectures.png
Our end-to-end model (far right) is much simpler than its best-performing predecessors.

Our model is also simpler — and therefore easier to reproduce — than its predecessors. Creating a powerful baseline that is also easy to reproduce is key to our research: our goal is not only to develop a new model but to stimulate future work from other research teams, accelerating progress and maybe catching up with the zero-shot-learning systems that classify static images.

In evaluating our model, we used a new approach to separating data into training and test sets, which better approximates real-world settings. Typically, researchers simply divide a single data set in two, using one part to train a model and the other test it.

overlapping_classes.png
A distance threshold of .05 (red line) removes almost 40 classes from our training set.

We instead use different data sets for training and testing. But first, we calculate the distance in the semantic space between the classes in the training set and their nearest neighbors in the test set. Then we throw out all the classes in the training set whose distance falls below some threshold.

This project grew out of the realization that existing methods for doing zero-shot video classification have prioritized the ability to handle long input videos. Hence the need to reduce computational complexity by using pretrained classifiers and special-purpose modules.

But many of the most successful methods in traditional video classification — which are not zero-shot systems but handle a prescribed subset of classes — do exactly the opposite, extracting a small snapshot of the input video while training the full network end to end. We adapted the same concept to zero-shot learning. Among other advantages, this makes it possible to train the model on large amounts of data.

We hope that our contribution will inspire other research teams to push the boundaries of zero-shot-learning video classification and that soon we will see this technology in commercially available products.

Related content

US, CA, San Diego
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by protecting Amazon customers from hackers and bad actors? Do you want to build advanced algorithmic systems that help manage the trust and safety of millions of customer every day? Are you excited by the prospect of analyzing and modeling terabytes of data and create state-of-art algorithms to solve real world problems? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Amazon Account Integrity team. The Amazon Account Integrity team works to ensure that customers are protected from bad actors trying to access their accounts. Our greatest challenge is protecting customer trust without unjustly harming good customers. To strike the right balance, we invest in mechanisms which allow us to accurately identify and mitigate risk, and to quickly correct and learn from our mistakes. This strategy includes continuously evolving enforcement policies, iterating our Machine Learning risk models, and exercising high‐judgement decision‐making where we cannot apply automation. Key job responsibilities Use statistical and machine learning techniques to create scalable risk management systems Analyzing and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches.
US, NY, New York
Are you passionate about conducting research to develop and grow leaders? Would you like to impact more than 1M Amazonians globally and improve the employee experience? If so, you should consider joining the People eXperience & Technology Central Science (PXTCS) team. Our goal is to be best and most diverse workforce in the world. PXTCS uses science, research, and technology to optimize employee experience and performance across the full employee lifecycle, from first contact through exit. We use economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal. This individual should be skilled in core data science tools and methods, icnluding SQL, a statistical software package (e.g., R, Python, or Stata), inferential statistics, and proficient in machine learning. This person should also have strong business acumen to navigate complex, ambiguous business challenges — they should be adept at asking the right questions, knowing what methodologies to use (and why), efficiently analyzing massive datasets, and communicating results to multiple audiences (e.g., technical peers, functional teams, business leaders). In order to move quickly, deliver high-quality results, and adapt to ever-evolving business priorities, effective communication skills in research fundamentals (e.g., research design, measurement, statistics) will also be a must. Major responsibilities will include: - Managing the full life cycle of large-scale research initiatives across multiple business segments that impact leaders in our organization (i.e., develop strategy, gather requirements, manage, and execute) - Serving as a subject matter expert on a wide variety of topics related to research design, measurement, analysis - Working with internal partners and external stakeholders to evaluate research initiatives that provide bottom-line ROI and incremental improvements over time - Collaborating with a cross-functional team that has expertise in social science, machine learning, econometrics, psychometrics, natural language processing, forecasting, optimization, business intelligence, analytics, and policy evaluation - Ability to query and clean complex datasets from multiple sources, to funnel into advanced statistical analysis - Writing high-quality, evidence-based documents that help provide insights to business leaders and gain buy-in - Sharing knowledge, advocating for innovative solutions, and mentoring others Inclusive Team Culture Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have 12 affinity groups (employee resource groups) with more than 1M employees across hundreds of chapters around the world. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which reminds team members to seek diverse perspectives, learn and be curious, and earn trust. Flexibility It isn’t about which hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We offer flexibility and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth We care about your career growth, too. Whether your goals are to explore new technologies, take on bigger opportunities, or get to the next level, we'll help you get there. Our business is growing fast and our people will grow with it. About the team We are a collegial and multidisciplinary team of researchers in People eXperience and Technology (PXT) that combines the talents of science and engineering to develop innovative solutions to make Amazon Earth's Best Employer. We leverage data and rigorous analysis to help Amazon attract, retain, and develop one of the world’s largest and most talented workforces.
US, WA, Bellevue
The Mission of Amazon's Artificial General Intelligence (AGI) team is to "Build world-class general-purpose intelligence services that benefits every Amazon business and humanity." Are you a data enthusiast? Are you a creative big thinker who is passionate about using data to direct decision making and solve complex and large-scale challenges? If so, then this position is for you! We are looking for a motivated individual with strong analytical and communication skills to join us. In this role, you will apply advanced analytics techniques, AI/ML, and statistical concepts to derive insights from massive datasets. The ideal candidate should have expertise in AI/ML, statistical analysis, and the ability to write code for building models and pipelines to automate data and analytics processing. They will help us design experiments, build models, and develop appropriate metrics to deeply understand the strengths and weaknesses of our systems. They will build dashboards to automate data collection and reporting of relevant data streams, providing leadership and stakeholders with transparency into our system's performance. They will turn their findings into actions by writing detailed reports and providing recommendations on where we should focus our efforts to have the largest customer impact. A successful candidate should be a self-starter, comfortable with ambiguity with strong attention to detail, and have the ability to work in a fast-paced and ever-changing environment. They will also help coach/mentor junior scientists in the team. The ideal candidate should possess excellent verbal and written communication skills, capable of effectively communicating results and insights to both technical and non-technical audiences
US, WA, Bellevue
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Applied Scientist to work on methodologies for Generative Artificial Intelligence (GenAI) models. As an Applied Scientist, you will be responsible for supporting the development of novel algorithms and modeling techniques to advance the state of the art. Your work will directly impact our customers and will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate development with multi-modal Large Language Models (LLMs) and GenAI. You will have significant influence on our overall strategy by working at the intersection of engineering and applied science to scale pre-training and post-training workflows and build efficient models. You will support the system architecture and the best practices that enable a quality infrastructure. Key job responsibilities Join us to work as an integral part of a team that has experience with GenAI models in this space. We work on these areas: - Pre-training and post-training multimodal LLMs - Scale training, optimization methods, and learning objectives - Utilize, build, and extend upon industry-leading frameworks - Work with other team members to investigate design approaches, prototype new technology, scientific techniques and evaluate technical feasibility - Deliver results independently in a self-organizing Agile environment while constantly embracing and adapting new scientific advances About the team The AGI team has a mission to push the envelope in GenAI with Large Language Models (LLMs) and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Principal Applied Scientist with a strong deep learning background, to lead the development of industry-leading technology with multimodal systems. As a Principal Applied Scientist, you are a trusted part of the technical leadership. You bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. You solicit differing views across the organization and are willing to change your mind as you learn more. Your artifacts are exemplary and often used as reference across organization. You are a hands-on scientific leader. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions. You amplify your impact by leading scientific reviews within your organization or at your location. You scrutinize and review experimental design, modeling, verification and other research procedures. You probe assumptions, illuminate pitfalls, and foster shared understanding. You align teams toward coherent strategies. You educate, keeping the scientific community up to date on advanced techniques, state of the art approaches, the latest technologies, and trends. You help managers guide the career growth of other scientists by mentoring and play a significant role in hiring and developing scientists and leads. Key job responsibilities You will be responsible for defining key research directions, adopting or inventing new machine learning techniques, conducting rigorous experiments, publishing results, and ensuring that research is translated into practice. You will develop long-term strategies, persuade teams to adopt those strategies, propose goals and deliver on them. You will also participate in organizational planning, hiring, mentorship and leadership development. You will be technically strong and with a passion for building scalable science and engineering solutions. You will serve as a key scientific resource in full-cycle development (conception, design, implementation, testing to documentation, delivery, and maintenance).
US, NY, New York
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best. Key job responsibilities - Lead and execute complex, ambiguous research projects from ideation to production deployment - Drive technical strategy and roadmap decisions for ML/AI initiatives - Collaborate cross-functionally with product, engineering, and business teams to translate research into scalable products - Publish research findings at top-tier conferences and contribute to the broader scientific community - Establish best practices for ML experimentation, evaluation, and deployment
US, NY, New York
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best. Key job responsibilities - Lead and execute complex, ambiguous research projects from ideation to production deployment - Drive technical strategy and roadmap decisions for ML/AI initiatives - Collaborate cross-functionally with product, engineering, and business teams to translate research into scalable products - Publish research findings at top-tier conferences and contribute to the broader scientific community - Establish best practices for ML experimentation, evaluation, and deployment
US, CA, Palo Alto
About Sponsored Products and Brands The Sponsored Products and Brands (SPB) team at Amazon Ads is re-imagining the advertising landscape through generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. About our team SPB Ad Response Prediction team is your choice, if you want to join a highly motivated, collaborative, and fun-loving team with a strong entrepreneurial spirit and bias for action. We are seeking an experienced and motivated Applied Scientist with machine learning engineering background who loves to innovate at the intersection of customer experience, deep learning, and high-scale machine learning systems. We are looking for a talented Applied Scientist with a strong background in machine learning engineering to join our team and help us grow the business. In this role, you will partner with a team of engineers and scientists to build advanced machine learning models and infrastructure, from training to inference, including emerging LLM-based systems, that deliver highly relevant ads to shoppers across all Amazon platforms and surfaces worldwide. Key job responsibilities As a Sr Applied Scientist, you will: * Develop scalable and effective machine learning models and optimization strategies to solve business problems. * Conduct research on new machine learning modeling to optimize all aspects of Sponsored Products business. * Enhance the scalability, automation, and efficiency of large-scale training and real-time inference systems. * Pioneer the development of LLM inference infrastructure to support next-generation GenAI workloads at Amazon Ads scale.
US, CA, Sunnyvale
As a Principal Applied Scientist within the Artificial General Intelligence (AGI) organization, you are a trusted part of the technical leadership. You bring business and industry context to science and technology decisions, set the standard for scientific excellence, and make decisions that affect the way we build and integrate algorithms. A Principal Applied Scientist will solicit differing views across the organization and are willing to change your mind as you learn more. Your artifacts are exemplary and often used as reference across organization. You are a hands-on scientific leader; develop solutions that are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility; and tackle intrinsically hard problems acquiring expertise as needed. Principal Applied Scientists are expected to decompose complex problems into straightforward solutions. You will amplify your impact by leading scientific reviews within your organization or at your location; and scrutinize and review experimental design, modeling, verification and other research procedures. You will also probe assumptions, illuminate pitfalls, and foster shared understanding; align teams toward coherent strategies; and educate keeping the scientific community up to date on advanced techniques, state of the art approaches, the latest technologies, and trends. AGI Principal Applied Scientists help managers guide the career growth of other scientists by mentoring and play a significant role in hiring and developing scientists and leads. You will play a critical role in driving the development of Generative AI (GenAI) technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities You will be responsible for defining key research directions, inventing new machine learning techniques, conducting rigorous experiments, and ensuring that research is translated into practice. You will also develop long-term strategies, persuade teams to adopt those strategies, propose goals and deliver on them. A Principal Applied Scientist will participate in organizational planning, hiring, mentorship and leadership development. You will build scalable science and engineering solutions, and serve as a key scientific resource in full-cycle development (conception, design, implementation, testing to documentation, delivery, and maintenance).
US, CA, Sunnyvale
Our mission is to create a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the model and which enable utilizing thousands of APIs and external knowledge sources to provide the best experience for each request across millions of customers and endpoints. We are looking for a passionate, talented, and resourceful Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware conversational AI. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in building Generative AI solutions with LLMs, enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, moving fast to ship solutions and then iterating on user feedback and interactions. Key job responsibilities As a Senior Applied Scientist, you will leverage your technical expertise and experience to collaborate with other talented applied scientists and engineers to research and develop novel algorithms and modeling techniques to reduce friction and enable natural and contextual conversations. You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. You will work on core LLM technologies, including Supervised Fine-Tuning (SFT), In-Context Learning (ICL), Learning from Human Feedback (LHF), etc. Your work will directly impact our customers in the form of novel products and services .