Amazon at CVPR: Pietro Perona on computer vision's frontiers

Efficient learning and the capacity for abstraction are attributes that will probably require new insights — but self-supervised learning could help.

The Conference on Computer Vision and Pattern Recognition (CVPR) — the premier conference in the field of computer vision — was first held in 1985. Pietro Perona, an Amazon Fellow and the Allan E. Puckett Professor of Electrical Engineering and Computation and Neural Systems at the California Institute of Technology, first attended in 1988, when he was a graduate student at the University of California, Berkeley.

Pietro Perona.jpg
Pietro Perona, an Amazon Fellow and the Allan E. Puckett Professor of Electrical Engineering and Computation and Neural Systems at Caltech.

“At the time, computer vision was a field for visionaries — pun intended — where we wanted to solve the question of how we can make a machine see,” Perona says. “The whole conference was maybe 200 people. And we had basically no clear idea how to make progress and so would try different things, and we would try and see if we could split the complex problem of vision into simpler questions. And the results were not very good. Now we see in the conference great systems working really well on very difficult problems. So the level of success and ambition is completely different.”

Much of that success stems, of course, from deep learning, which superseded many earlier computer vision techniques. But, Perona points out, it’s not as if computer vision researchers had simply failed to recognize the utility of deep learning for CVPR’s first 25 years. Until around 2010, he says, using deep learning to tackle computer vision problems wasn’t really an option.

“Deep learning has been around since the late ’80s,” he says, “but we simply didn't have enough computational power to run big experiments on complex images. You have to look to 2008, 2009, when good GPUs began coming out. Then, people in computer vision had to learn how to code up these GPUs. There were no special software tools at the time, so people were just handcrafting software.

“Another factor is the emergence of vast, well-annotated datasets of images, which came about in 2005 to 2010. That was the result of a couple of things. One was the Internet: all of a sudden there were tons of images available. The other thing is Amazon Mechanical Turk, which came out in 2005, and without which we would not be able to have these very large annotated datasets. It's funny, because within Amazon, people are not so aware of it, but Amazon Mechanical Turk was one of the three big factors for the AI revolution to come about. Datasets like ImageNet and COCO would not have been possible without it.”

Unscaled heights

For all of deep learning’s successes on such canonical computer vision tasks as object recognition, there are some respects in which it has made little headway, Perona says.

More on Amazon at CVPR

Read more about Amazon's presence at CVPR, including papers, workshop involvement, and committee membership.

“One barrier is the efficiency of learning,” he says. “There was a paper from my team looking at classification of plants and animals. If you have 10,000 images per category — each species of bird or species of butterfly — then the machine will beat a human in accuracy. But the efficiency is not even close. If I give you a new species you have never seen before, and I show you three to five pictures of this new species, you become competent at recognizing that species. For a machine that would not be possible.” 

One reason to try to break this barrier is scientific, Perona says. “Humans don't own a special kind of computation,” he says. “So it should be possible for machines to do it. You want to understand this exquisite ability that humans have, how it works.”

But, he adds, there are also practical reasons to worry about learning efficiency.

“If you think of people who are trying to use machine vision in industry or in science, something that is frequent is often not so important,” Perona explains. “What is rare is more important. So if you think of building a machine that can help an ophthalmologist recognize retinal disease, let's suppose, there are some 10 or 20 diseases of the retina that doctors see all the time. So they have no problem. They don't need help from a machine. But then there are another about 600 diseases that they see fewer times. And some of those are seen just by a few doctors per year. 

It's funny, because within Amazon, people are not so aware of it, but Amazon Mechanical Turk was one of the three big factors for the AI revolution to come about.
Pietro Perona

“The world is a long-tailed distribution. A few things are very frequent, and most things are not frequent at all. How often do you see an elephant cross the road? But if you want to build autonomous vehicles, they should be able to handle elephants crossing the road.”

Another aspect of human visual reasoning that deep learning has struggled to duplicate is the capacity for abstraction, Perona says.

“Right now, we need to train machines with diverse backgrounds,” he says. “If you want to train a machine to recognize toads, you've got to show it pictures of toads in all possible environments and all possible poses for the machine to be able to abstract away the concept of toad. If you had trained the machine with pictures of toads always against the same piece of wallpaper or the same blank background, the machine would not be able to handle the toad in a new scenario. Or take a cow on the beach: machines have a terrible time recognizing a cow that is right in the middle of a picture, and it's on the beach. So we know that machines are not yet seeing objects the same way we see them. From the training examples, they are not able to abstract away the attributes of these objects. What is the face of the cow? And relating the face of the cow with the face of a dog and the face of a person — the machine is not yet able to do that.”

Self-supervised learning

Before machines’ learning efficiency and capacity for abstraction can rival humans’, Perona says, “new insights are needed”. But in the near term, progress on both fronts could come from self-supervised learning, a topic that has, he says, grown in popularity at CVPR in recent years.

“Even if there is nobody teaching a machine what to look for, the machine can teach itself in some way and can be prepared to learn the next task,” Perona explains. “Let's suppose that we have a million images, for example, but no labels telling the machine what is in each picture. The machine has CPU cycles to spare, so what could it do? The images are all upside up, with the sky up and the ground down. But the machine could randomly flip a few and train itself to recognize when the image is flipped versus when the image is as it should be. Here’s another game you can play: each image is color, so there are three channels, RGB [red, green, blue]. So you could try and predict the green from the red and blue.

“Now, it turns out that in order to win at these games, it will have to develop some sense for the key features in the image. And one crucial feature is that trees grow from the ground up in some way. And so it has to recognize the structure of trees or the structure of things that are planted in the ground to recognize what is on the ground and what is not. It doesn't have a high level of semantic knowledge, but it does develop some features that are good preparation for the next step.

“To give you more advanced example, a student of mine and I have a paper showing how a machine can learn about numbers purely by playing with objects. Suppose that you had a few M&Ms, and you are just tossing them into a cup in front of you, and then you're picking one up and moving it away or putting one in or just scrambling the ones you have and rearranging them like a child would do. We demonstrate that the machine is able to learn the concept of number, an abstract concept, purely by playing with little objects, taking one out and putting one in, and so on. And it's quite interesting how that concept, that abstraction, can emerge from no supervision at all.”

Research areas

Related content

US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! As an Applied Scientist in the Prime Video Playback Intelligence Organization, you will have deep subject matter expertise in applied machine learning and data science, with specializations in video streaming optimization, information retrieval, anomaly detection and root-causing systems, large language models and generative AI across various modalities. Key job responsibilities - Work with multiple teams of scientists, engineers, and product managers to translate business and functional requirements into concrete deliverables leading strategic efforts to enhance customer quality of experiences. - Work on problems spaces such as: improving the customer playback quality of experience across Video on Demand, Live Events and Linear Content. - Reduce the time/cost/effort to optimize the customer experience as well as detect, root-cause, and mitigate defects in the customer experience. You’ll seek to understand the depth and nuance of streaming video at scale and identify opportunities to grow our business and improve customer quality of experience via principled ML/AI solutions. - Lead integration of new algorithms and processes into existing modeling stacks, simplify and streamline the existing modeling stacks, and develop testing and evaluation strategies. Ultimately, you'll work backwards from the desired outcomes and lead the way on determining the ideal solution (statistical techniques, traditional ML, GenAI, etc). A day in the life We love solving challenging and hard problems in our quest to innovate on behalf of our customers and provide the best video streaming experience. We push the boundaries to leverage and invent technologies which help create unrivaled experiences for our customers to help us move fast in a growing and changing environment. We use data to guide our decisions, work closely with our engineering and product counterparts, and partner with other Science teams as well as academic institutions to learn and guide in an environment of innovation.
IN, KA, Bengaluru
Selection Monitoring team is responsible for making the biggest catalog on the planet even bigger. In order to drive expansion of the Amazon catalog, we develop advanced ML/AI technologies to process billions of products and algorithmically find products not already sold on Amazon. We work with structured, semi-structured and Visually Rich Documents using deep learning, NLP and image processing. The role demands a high-performing and flexible candidate who can take responsibility for success of the system and drive solutions from research, prototype, design, coding and deployment. We are looking for Applied Scientists to tackle challenging problems in the areas of Information Extraction, Efficient crawling at internet scale, developing ML models for website comprehension and agents to take multi-step decisions. You should have depth and breadth of knowledge in text mining, information extraction from Visually Rich Documents, semi structured data (HTML) and advanced machine learning. You should also have programming and design skills to manipulate Semi-Structured and unstructured data and systems that work at internet scale. You will encounter many challenges, including: - Scale (build models to handle billions of pages), - Accuracy (requirements for precision and recall) - Speed (generate predictions for millions of new or changed pages with low latency) - Diversity (models need to work across different languages, market places and data sources) You will help us to - Build a scalable system which can algorithmically extract information from world wide web. - Intelligently cluster web pages, segment and classify regions, extract relevant information and structure the data available on semi-structured web. - Build systems that will use existing Knowledge Base to perform open information extraction at scale from visually rich documents. Key job responsibilities - Use AI, NLP and advances in LLMs/SLMs and agentic systems to create scalable solutions for business problems. - Efficiently Crawl web, Automate extraction of relevant information from large amounts of Visually Rich Documents and optimize key processes. - Design, develop, evaluate and deploy, innovative and highly scalable ML models, esp. leveraging latest advances in RL-based fine tuning methods like DPO, GRPO etc. - Work closely with software engineering teams to drive real-time model implementations. - Establish scalable, efficient, automated processes for large scale model development, model validation and model maintenance. - Lead projects and mentor other scientists, engineers in the use of ML techniques. - Publish innovation in research forums.
BR, SP, Sao Paulo
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Machine Learning team in Mexico City. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning, LLMs and Agentic AI, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Design, implement, and evolve Agentic AI systems that can autonomously perceive their environment, reason about context, and take actions across business workflows—while ensuring human-in-the-loop oversight for high-stakes decisions. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise leadership, both tech and non-tech. - Support technical trade-offs between short-term needs and long-term goals.
US, WA, Bellevue
Alexa International Science team is looking for a passionate, talented, and inventive Senior Applied Scientist to help build industry-leading technology with Large Language Models (LLMs) and multimodal systems, requiring strong deep learning and generative models knowledge. At this level, you will drive cross-team scientific strategy, influence partner teams, and deliver solutions that have broad impact across Alexa's international products and services. Key job responsibilities As a Senior Applied Scientist with the Alexa International team, you will work with talented peers to develop novel algorithms and modeling techniques to advance the state of the art with LLMs, particularly delivering industry-leading scientific research and applied AI for multi-lingual applications — a challenging area for the industry globally. Your work will directly impact our global customers in the form of products and services that support Alexa+. You will leverage Amazon's heterogeneous data sources and large-scale computing resources to accelerate advances in text, speech, and vision domains. The ideal candidate possesses a solid understanding of machine learning, speech and/or natural language processing, modern LLM architectures, LLM evaluation & tooling, and a passion for pushing boundaries in this vast and quickly evolving field. They thrive in fast-paced environment, like to tackle complex challenges, excel at swiftly delivering impactful solutions while iterating based on user feedback, and are able to influence and align multiple teams around a shared scientific vision.
US, CA, San Francisco
Amazon has launched a new research lab in San Francisco to develop foundational capabilities for useful AI agents. We’re enabling practical AI to make our customers more productive, empowered, and fulfilled. Our work leverages large vision language models (VLMs) with reinforcement learning (RL) and world modeling to solve perception, reasoning, and planning to build useful enterprise agents. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. Key job responsibilities You will contribute directly to AI agent development in an applied research role to improve the multi-model perception and visual-reasoning abilities of our agent. Daily responsibilities including model training, dataset design, and pre- and post-training optimization. You will be hired as a Member of Technical Staff.
US, NY, New York
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to apply their structural econometrics skillsets to solve real world problems. The intern will work in the area of Amazon Private Brands and develop models to improve our product selection. Our PhD Economist Internship Program offers hands-on experience in applied economics, supported by mentorship, structured feedback, and professional development. Interns work on real business and research problems, building skills that prepare them for full-time economist roles at Amazon and beyond. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. About the team The Amazon Private Brands science advance team applies Machine Learning, Statistics and Econometrics/economics to solve high-impact business problems, develop prototypes for Amazon-scale science solutions, and optimize key business functions of Amazon Private Brands and other Amazon orgs. We are an interdisciplinary team, using science and technology and leveraging the strengths of engineers and scientists to build solutions for some of the toughest business problems at Amazon, covering areas such as pricing, discovery, negotiation, forecasting, supply chain and product selection/development.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best.
US, TX, Austin
Amazon Security is looking for a talented and driven Applied Scientist II to spearhead Generative AI acceleration within the Secure Third Party Tools (S3T) organization. The S3T team has bold ambitions to re-imagine security products that serve Amazon's pace of innovation at our global scale. This role will focus on leveraging large language models and agentic AI to transform third-party security risk management, automate complex vendor assessments, streamline controllership processes, and dramatically reduce assessment cycle times. You will drive builder efficiency and deliver bar-raising security engagements across Amazon. Key job responsibilities Lead the research, design, and development of GenAI-powered solutions to enhance the security and governance of third-party tools across Amazon Develop and fine-tune large language models (LLMs) and other ML models tailored to security use cases, including risk detection, anomaly identification, and automated compliance Collaborate with cross-functional teams — including Security Engineers, Software Development Engineers, and Product Managers — to translate scientific innovations into scalable, production-ready systems Define and drive the GenAI roadmap for the S3T organization, influencing strategy and prioritization Conduct rigorous experimentation, evaluate model performance, and iterate rapidly to deliver measurable impact Stay current with the latest advancements in GenAI and applied ML research, and bring relevant innovations into Amazon's security ecosystem Mentor junior scientists and contribute to a culture of scientific excellence within the team About the team Security is central to maintaining customer trust and delivering delightful customer experiences. At Amazon, our Security organization is designed to drive bar-raising security engagements. Our vision is that Builders raise the Amazon security bar when they use our recommended tools and processes, with no overhead to their business. Diverse Experiences Amazon Security values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why Amazon Security? At Amazon, security is central to maintaining customer trust and delivering delightful customer experiences. Our organization is responsible for creating and maintaining a high bar for security across all of Amazon’s products and services. We offer talented security professionals the chance to accelerate their careers with opportunities to build experience in a wide variety of areas including cloud, devices, retail, entertainment, healthcare, operations, and physical stores. Inclusive Team Culture In Amazon Security, it’s in our nature to learn and be curious. Ongoing DEI events and learning experiences inspire us to continue learning and to embrace our uniqueness. Addressing the toughest security challenges requires that we seek out and celebrate a diversity of ideas, perspectives, and voices. Training & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, training, and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve.
US, MA, N.reading
Amazon Industrial Robotics is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon Industrial Robotics we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at an unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. Key job responsibilities Design and deploy end-to-end teleoperation pipelines integrating VR/AR headsets and haptics interfaces with robotic hardware Implement force-feedback and tactile sensing algorithms to provide operators with a "sense of touch," improving performance in contact-rich manipulation tasks Collaborate with ML teams to ensure teleoperation interfaces capture high-fidelity state-action pairs, including proprioception, visual, and force/torque data for model training Develop custom networking and streaming protocols to minimize operator-to-robot latency. Conduct user studies to evaluate ergonomics, cognitive load, and "telepresence" effectiveness to iterate on UI/UX designs.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the next-level. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Key job responsibilities * Develop, deploy, and operate scalable bioinformatics analysis workflows on AWS * Evaluate and incorporate novel bioinformatic approaches to solve critical business problems * Originate and lead the development of new data collection workflows with cross-functional partners * Partner with laboratory science teams on design and analysis of experiments About the team Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best.