iNaturalist opens up a wealth of nature data — and computer vision challenges

Amazon Machine Learning Research Award recipient utilizes a combination of people and machine learning models to illuminate the planet's incredible biodiversity.

On a hike in the woods, you spot a colorful little bird. You're pretty sure it's a finch — but what kind? The iNaturalist app was made for this kind of scenario: people all over the world use it to record and identify what they're seeing outside. Increasingly, artificial intelligence enabled by Amazon Web Services (AWS) is playing a role in classifying those observations.

iNaturalist launched about 10 years ago, evolving from a master's project from three students at the University of California, Berkeley. Since then, the app has attracted a community of 1.5 million scientists and nature lovers who post photos of everything from bumblebees to bears.

iNaturalist, which today is a joint initiative of the California Academy of Sciences and the National Geographic Society, once relied solely on its members to identify species.  Now computers are helping out.

"iNaturalist's goal is really just to connect people with nature," said Grant Van Horn, a research engineer at the Cornell Lab of Ornithology. Being able to name that flower or insect you see "really ups the engagement level and makes for a completely different experience,” he adds.

A unique computer vision challenge

Grant Van Horn, research engineer at the Cornell Lab of Ornithology
Grant Van Horn
Oisin Mac Aodha, assistant professor of machine learning at the University of Edinburgh
Oisin Mac Aodha

Van Horn and Oisin Mac Aodha, now an assistant professor of machine learning at the University of Edinburgh, began working with iNaturalist five years ago to solve challenges related to the app's data. Both were at the California Institute of Technology; Van Horn was working on his PhD, and Mac Aodha was a postdoctoral researcher. They were interested in how computer vision could help accelerate and validate the identifications that humans were making on the app.

The appeal of iNaturalist to the researchers is that it represented a unique challenge to the computer vision community, Van Horn says.

If you were to build a computer model to identify finches, for example, you might scrape some images from the internet and use those to train it.

But that dataset, likely full of high-quality photos with serenely perched birds, would look quite different from the vast diversity of mostly amateur photos on iNaturalist. There, a hiker may have just barely managed to capture a photo as a bird is flying away, or the bird might be hard to identify against the background.

That all assumes the bird is even standing still. Swallows and swifts, Van Horn noted, are rarely perching — a good birder will recognize them in flight, but how do you train a computer to do the same thing?

This is just one in a seemingly endless list of computer vision challenges related to nature.

Many species look strikingly similar. They have more than one name: The scientific one (Danaus plexippus, for example) and the common one (monarch butterfly). They can have more than one form: females of one species might look different from their male counterparts; eggs turn into larva, which turn into mature insects.

inat_fg.png
An image provided by the researchers illustrates the difficulty involved in identifying species from images taken in the wild.
Courtesy of Grant Van Horn and Oisin Mac Aodha

These challenges exist across millions of plant and animal species in the world. Taken from that perspective, the more than 300,000 species catalogued on the AWS-hosted iNaturalist are a fraction of what might be possible as users continue to add data.

"You could imagine a future system that can reason about all these things at, effectively, an unprecedented level of ability," Mac Aodha said, "because there's no person that's going to be able to tell you which of X million different things this one picture could be."

New machine learning competitions

In 2017, Van Horn and Mac Aodha began hosting competitions with iNaturalist data at the annual Conference on Computer Vision and Pattern Recognition (CVPR). Part of the conference's Workshop on Fine-Grained Visual Categorization, the competitions present a dataset and then rank entries on their accuracy in classifying it. The winning team is the one that generates the lowest error rate.

In the beginning, just the basic taxonomy of iNaturalist's data posed a learning curve for Van Horn and Mac Aodha. "This was not obvious to us: there's no one taxonomic authority in the world," Van Horn said.

They spent considerable time early on learning to work with the taxonomy, clean up the data, and assemble a dataset comprising 859,000 images for the first competition. In the second year, they featured a dataset with more of a long-tailed distribution, meaning there were many species that had relatively few associated images. In 2019, the dataset was reduced to 268,243 images of highly similar categories captured in a wide variety of situations.

inaturalist dataset image.jpg
After a break last year, the main iNat competition is back and bigger, with a training dataset of 2.7 million images representing 10,000 species. The image above is from an earlier iNat competition dataset.
Courtesy of Grant Van Horn and Mac Aodha Oisin

After a break last year, the main iNat competition is back and bigger, with a training dataset of 2.7 million images representing 10,000 species. The iNat Challenge 2021, which began March 8, ends on May 28.

"It's not like we're trying to throw in categories just to make this thing sound big," Van Horn said. "It is big. And it will just continue to get bigger as the years progress."

This year's larger dataset could encourage teams to explore a recent trend in the machine learning field toward unsupervised learning, where a computer model can learn from the data without labels, or predefined "answers," by seeking patterns within the information.

"We have quite a lot of images for each of these 10,000 categories," Mac Aodha said. "We're hoping that this will open up some interesting avenues for people who are exploring the self-supervised question in the context of this naturalistic, real-world task."

Each competition entry must provide one predicted classification for every image in the dataset. An error rate of 5% on this year’s dataset would be “amazing,” Van Horn said, adding that one team had already achieved an 8.67% error rate by late March.

A move to Open Data

The ability to classify large groups of images opens up the potential to answer a wide range of scientific questions about habitat, behavior, and variations within a species. For example, iNaturalist users have documented alligator lizards' jaw-clinching mating rituals in Los Angeles, where the amount of private property makes traditional wildlife studies impossible.

With this type of insight in mind, Mac Aodha and Van Horn have created a new dataset of natural world tasks (NeWT) that moves beyond the question of species classification and explores concepts related to behavior and attributes that are also exhibited in these photographs.

This work is appearing in the CVPR conference this year, and a competition is being planned to challenge competitors to produce models that generalize to these alternative questions.

So far, winning entries in the CVPR competitions haven’t been deployed by iNaturalist itself, because there are performance tradeoffs between code that generates the least errors, and code that is efficient enough to run on smartphones. But the competition datasets, Mac Aodha said, have found widespread use in the computer vision and machine learning literature, generating some 300 citations over the last few years.

FGVC7: Intro to the 7th Workshop on Fine-Grained Visual Categorization at CVPR 2020

The competitions are hosted on Kaggle, a machine learning and data science platform that draws a wide variety of entrants beyond the iNaturalist community. The 2019 competition drew 213 teams from around the world, and the winners were based in China.

In order for the competition to be fair, an entrant must be able to access and work with the thousands or millions of images in a dataset, no matter where they are in the world. The competitions, and now the iNaturalist app itself, are part of Open Data on AWS, which "makes accessing the data insanely easy and very convenient," Van Horn said.

In 2020, iNaturalist received an Amazon Machine Learning Research Award, which provides unrestricted cash funds and AWS promotional credits to academics to advance the frontiers of machine learning. That helped cover costs for iNaturalist to continue storing data on AWS as it implemented machine learning classification. In March, the app moved to the Registry of Open Data on AWS, which ensures iNaturalist's vast collection of observations — some 60 million — will remain freely accessible to anyone who wants access.

"iNaturalist is doing really important work to bring scientists and everyday citizens together to create a community and drive awareness on biodiversity and environmental sciences," said An Luo, senior technical program manager leading the Amazon Research Awards program. “We are very excited that AWS is empowering them to serve more people as well as conduct advanced machine learning research using the AWS Open Data platform and AWS machine learning services such as Amazon SageMaker.”

Today, iNaturalist has gone from being entirely people-powered to regularly providing machine-generated identifications that are only just beginning to reveal new potential research paths.

"It's important for us that this data lasts and is accessible for a long time, not just for the duration of the competitions," Mac Aodha said. "Having a stable home for these datasets is a really valuable thing."

Related content

US, MA, North Reading
At Amazon Robotics, we design advanced robotic systems capable of intelligent perception, learning, and action alongside humans, all on a large scale. Our goal is to develop robots that increase productivity and efficiency at the Amazon fulfillment centers while ensuring the safety of workers. We are seeking an Applied Scientist to develop innovative, scalable solutions in feedback control and state estimation for robotic systems, with a focus on contact-rich manipulation tasks. In this role, you will formulate physics-based models of robotic systems, perform analytical and numerical studies, and design control and estimation algorithms that integrate fundamental principles with data-driven techniques. You will collaborate with a world-class team of experts in perception, machine learning, motion planning, and feedback controls to innovate and develop solutions for complex real-world problems. As part of your work, you will investigate applicable academic and industry research to develop, implement, and test solutions that support product features. You will also design and validate production designs. To succeed in this role, you should demonstrate a strong working knowledge of physical systems, a desire to learn from new challenges, and the problem-solving and communication skills to work within a highly interactive and experienced team. Candidates must show a hands-on passion for their work and the ability to communicate their ideas and concepts both verbally and visually. Key job responsibilities - Research, design, implement, and evaluate feedback control, estimation, and motion-planning algorithms, ensuring effective integration with perception, manipulation, and system-level components. - Develop experiments, simulations, and hardware prototypes to validate control algorithms, and optimization techniques in contact-rich manipulation and other challenging scenarios. - Collaborate with software engineering teams to enable scalable, real-time, and maintainable implementations of algorithms in production systems. - Partner with cross-functional teams across hardware, systems engineering, science, and operations to transition algorithms from early prototyping to robust, production-ready solutions. - Engage with stakeholders at all levels to iterate on system design, define requirements, and drive integration of control and estimation capabilities into Amazon Robotics platforms. A day in the life Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply!
IN, KA, Bengaluru
You will be working with a unique and gifted team developing exciting products for consumers. The team is a multidisciplinary group of engineers and scientists engaged in a fast paced mission to deliver new products. The team faces a challenging task of balancing cost, schedule, and performance requirements. You should be comfortable collaborating in a fast-paced and often uncertain environment, and contributing to innovative solutions, while demonstrating leadership, technical competence, and meticulousness. Your deliverables will include development of thermal solutions, concept design, feature development, product architecture and system validation through to manufacturing release. You will support creative developments through application of analysis and testing of complex electronic assemblies using advanced simulation and experimentation tools and techniques. Key job responsibilities In this role, you will: - Own thermal design for consumer electronics products at the system level, proposing thermal architecture and aligning with functional leads - Perform CFD simulations using tools such as Star-CCM+ or FloEFD to assess thermal feasibility, identify risks, and propose mitigation options - Generate data processing, statistical analysis, and test automation scripts to improve data consistency, insight quality, and team efficiency - Plan and execute thermal validation activities for devices and SoC packages, including test setup definition, data review, and issue tracking - Work closely with cross-functional and cross-geo teams to support product decisions, generate thermal specifications, and align on thermal requirements - Prepare clear summaries and reports on thermal results, risks, and observations for review by cross-functional leads About the team Amazon Lab126 is an inventive research and development company that designs and engineers high-profile consumer electronics. Lab126 began in 2004 as a subsidiary of Amazon.com, Inc., originally creating the best-selling Kindle family of products. Since then, we have produced innovative devices like Fire tablets, Fire TV and Amazon Echo. What will you help us create?
CA, ON, Toronto
The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through cutting-edge generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Key job responsibilities • Collaborate with business, engineering and science leaders to establish science optimization and monetization roadmap for Amazon Retail Ad Service • Drive alignment across organizations for science, engineering and product strategy to achieve business goals • Lead/guide scientists and engineers across teams to develop, test, launch and improve of science models designed to optimize the shopper experience and deliver long term value for Amazon advertisers and third party retailers • Develop state of the art experimental approaches and ML models to keep up with our growing needs and diverse set of customers. • Participate in the Science hiring process as well as mentor other scientists - improving their skills, their knowledge of your solutions, and their ability to get things done. About the team Amazon Retail Ad Service within Sponsored Products and Brands is an ad-tech solution that enables retailers to monetize their online web and app traffic by displaying contextually relevant sponsored products ads. Our mission is to provide retailers with ad-solution for every type of supply to meet their advertising goals. At the same time, enable advertisers to manage their demand across multiple supplies (Amazon, offsite, third-party retailers) leveraging tools they are already familiar with. Our problem space is challenging and exciting in terms of different traffic patterns, varying product catalogs based on retailer industry and their shopper behaviors.
US, MA, N.reading
Amazon Industrial Robotics is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon Industrial Robotics we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at an unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of dexterous manipulation system that: - Enables unprecedented generalization across diverse tasks - Enables contact-rich manipulation in different environments - Seamlessly integrates low-level skills and high-level behaviors - Leverage mechanical intelligence, multi-modal sensor feedback and advanced control techniques. The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. Key job responsibilities - Design and implement methods for dexterous manipulation - Design and implement methods for use of dexterous end effectors with force and tactile sensing - Develop a hierarchical system that combines low-level control with high-level planning - Utilize state-of-the-art manipulation models and optimal control techniques
AT, Graz
Are you a MS or PhD student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models and speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for students interested in using a variety of domain expertise to invent, design and implement state-of-the-art solutions for never-before-solved problems. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science https://amazon.jobs/content/en/how-we-hire/university-roles/applied-science Key job responsibilities As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create roadmaps and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists and other science interns to develop solutions and deploy them into production. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal intern must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, Estonia, France, Germany, Ireland, Israel, Italy, Jordan, Luxembourg, Netherlands, Poland, Romania, Spain, South Africa, UAE, and UK). Please note these are not remote internships.
IN, HR, Gurugram
Lead ML teams building large-scale forecasting and optimization systems that power Amazon’s global transportation network and directly impact customer experience and cost. As an Applied Science Manager, you will set scientific direction, mentor applied scientists, and partner with engineering and product leaders to deliver production-grade ML solutions at massive scale. Key job responsibilities 1. Lead and grow a high-performing team of Applied Scientists, providing technical guidance, mentorship, and career development. 2. Define and own the scientific vision and roadmap for ML solutions powering large-scale transportation planning and execution. 3. Guide model and system design across a range of techniques, including tree-based models, deep learning (LSTMs, transformers), LLMs, and reinforcement learning. 4. Ensure models are production-ready, scalable, and robust through close partnership with stakeholders. Partner with Product, Operations, and Engineering leaders to enable proactive decision-making and corrective actions. 5. Own end-to-end business metrics, directly influencing customer experience, cost optimization, and network reliability. 6. Help contribute to the broader ML community through publications, conference submissions, and internal knowledge sharing. A day in the life Your day includes reviewing model performance and business metrics, guiding technical design and experimentation, mentoring scientists, and driving roadmap execution. You’ll balance near-term delivery with long-term innovation while ensuring solutions are robust, interpretable, and scalable. Ultimately, your work helps improve delivery reliability, reduce costs, and enhance the customer experience at massive scale.
IL, Haifa
Come join the AWS Agentic AI science team in building the next generation models for intelligent automation. AWS, the world-leading provider of cloud services, has fostered the creation and growth of countless new businesses, and is a positive force for good. Our customers bring problems that will give Applied Scientists like you endless opportunities to see your research have a positive and immediate impact in the world. You will have the opportunity to partner with technology and business teams to solve real-world problems, have access to virtually endless data and computational resources, and to world-class engineers and developers that can help bring your ideas into the world. As part of the team, we expect that you will develop innovative solutions to hard problems, and publish your findings at peer reviewed conferences and workshops. We are looking for world class researchers with experience in one or more of the following areas - autonomous agents, API orchestration, Planning, large multimodal models (especially vision-language models), reinforcement learning (RL) and sequential decision making.
US, VA, Herndon
This position requires that the candidate selected be a US Citizen and currently possess and maintain an active Top Secret security clearance. The Amazon Web Services Professional Services (ProServe) team is seeking an experienced Delivery Practice Manager (DPM) to join our ProServe Shared Delivery Team (SDT) at Amazon Web Services (AWS). In this role, you'll manage a team of ProServe Delivery Consultants while supporting AWS enterprise customers through transformative projects. You'll leverage your IT and/or Management Consulting background to serve as a strategic advisor to customers, partners, and internal AWS teams. As a DPM you will be responsible for building and managing a team of Delivery Consultants and/or Engagement Managers working with customers and partners to architect and implement innovative solutions. You’ll routinely engage with Director, C-level executives, and governing boards, whilst being responsible for opportunity capture and driving engagement delivery. You’ll work closely with partner teams; drive business development initiatives through thought leadership; provide portfolio guidance and oversight; and meet and exceed customer satisfaction targets. As a DPM you are primarily focused directly or through their teams, on understanding and defining business outcomes for customers by building trust, identifying applicable AWS Professional Services offerings, and creating proposals and SOW’s. Your experience gained leading teams within the technology sector, will equip you with the ability to optimize team performance through implementing tailored people development plans, ensuring your teams are aligned to customer needs, and have the skills and capacity to address customer outcomes. Possessing the ability to translate technical concepts into business value for customers and then talk in technical depth with teams, you will cultivate strong customer, Amazon Global Sales (AGS), and ProServe team relationships which enables exceptional business performance. DPMs success is primarily measured by consistently delivering customer engagements by supporting sales through scoping technical requirements for an engagement, delivering engagements on time, within budget, and exceeding customer expectations. They will hold the Practice total utilization goal and be responsible for optimizing team performance. The AWS Professional Services organization is a global team of experts that help customers realize their desired business outcomes when using the AWS Cloud. We work together with customer teams and the AWS Partner Network (APN) to execute enterprise cloud computing initiatives. Our team provides assistance through a collection of offerings which help customers achieve specific outcomes related to enterprise cloud adoption. We also deliver focused guidance through our global specialty practices, which cover a variety of solutions, technologies, and industries. Key job responsibilities • Building and managing a high-performing team of Delivery Consultants • Collaborating with Delivery Consultants, Engagement Managers, Account Executives, and Cloud Architects to deploy solutions and provide input on new features • Developing and overseeing the implementation of innovative, forward-looking IT strategies for customers • Managing practice P&L, ensuring on-time and within-budget delivery of customer engagements • Driving business development initiatives and exceed customer satisfaction targets
IL, Haifa
Are you a scientist interested in pushing the state of the art in Information Retrieval, Large Language Models and Recommendation Systems? Are you interested in innovating on behalf of millions of customers, helping them accomplish their every day goals? Do you wish you had access to large datasets and tremendous computational resources? Do you want to join a team of capable scientist and engineers, building the future of e-commerce? Answer yes to any of these questions, and you will be a great fit for our team at Amazon. Our team is part of Amazon’s Personalization organization, a high-performing group that leverages Amazon’s expertise in machine learning, generative AI, large-scale data systems, and user experience design to deliver the best shopping experiences for our customers. Our team builds large-scale machine-learning solutions that delight customers with personalized and up-to-date recommendations that are related to their interests. We are a team uniquely placed within Amazon, to have a direct window of opportunity to influence how customers will think about their shopping journey in the future. As an Applied Scientist in our team, you will be responsible for the research, design, and development of new AI technologies for personalization. You will adopt or invent new machine learning and analytical techniques in the realm of recommendations, information retrieval and large language models. You will collaborate with scientists, engineers, and product partners locally and abroad. Your work will include inventing, experimenting with, and launching new features, products and systems. Please visit https://www.amazon.science for more information.
IL, Haifa
Are you a scientist interested in pushing the state of the art in Information Retrieval, Large Language Models and Recommendation Systems? Are you interested in innovating on behalf of millions of customers, helping them accomplish their every day goals? Do you wish you had access to large datasets and tremendous computational resources? Do you want to join a team of capable scientist and engineers, building the future of e-commerce? Answer yes to any of these questions, and you will be a great fit for our team at Amazon. Our team is part of Amazon’s Personalization organization, a high-performing group that leverages Amazon’s expertise in machine learning, generative AI, large-scale data systems, and user experience design to deliver the best shopping experiences for our customers. Our team builds large-scale machine-learning solutions that delight customers with personalized and up-to-date recommendations that are related to their interests. We are a team uniquely placed within Amazon, to have a direct window of opportunity to influence how customers will think about their shopping journey in the future. As an Applied Scientist in our team, you will be responsible for the research, design, and development of new AI technologies for personalization. You will adopt or invent new machine learning and analytical techniques in the realm of recommendations, information retrieval and large language models. You will collaborate with scientists, engineers, and product partners locally and abroad. Your work will include inventing, experimenting with, and launching new features, products and systems. Please visit https://www.amazon.science for more information.