Do large language models really need all those layers?

Finding that 70% of attention heads and 20% of feed-forward networks can be excised with minimal effect on in-context learning suggests that large language models are undertrained.

Large language models (LLMs) have been around for a while but have really captured the attention of the public this year, with the advent of ChatGPT. LLMs are typically pretrained on massive volumes of data; recent variants are additionally tuned to follow instructions and incorporate human feedback using reinforcement learning.

A fascinating ability that these LLMs demonstrate is in-context learning, where a model can learn to perform a task just by following a few (or sometimes even zero) good examples provided along with a new input. Following this paradigm of learning, larger LLMs also proved more capable of performing a wide variety of tasks than smaller ones, when the amount of pretraining data was fixed.

In a paper we’re presenting at this year’s meeting of the Association for Computational Linguistics (ACL), we investigate the importance of model scale for in-context learning, from the perspective of architectural interpretability. We specifically ask the question Are all LLM components really needed to perform in-context learning?

Related content
With an encoder-decoder architecture — rather than decoder only — the Alexa Teacher Model excels other large language models on few-shot tasks such as summarization and machine translation.

We conducted our investigation as a case study of the OPT-66B model, a 66-billion-parameter LLM that was open-sourced by Meta last year to serve as an open replica of GPT-3 (and was the largest publicly available decoder-only LLM at the time of our study). We found that a significant portion of the model could be discarded without affecting performance, indicating that OPT-66B and quite likely other prominent LLMs are undertrained.

We believe our findings are useful in helping build more powerful LLMs by identifying (or more generally providing methods to identify) architectural elements that may need to be trained better.

LLM building blocks

Modern LLMs use the Transformer architecture, which depends on an attention mechanism: the model learns to predict which prior tokens in the sequence it should attend to when predicting the current token.

Related content
Amazon’s Yang Liu, general chair of this year’s meeting of the Association for Computational Linguistics, on the road ahead for LLMs.

Specifically, LLMs use multihead attention, meaning that they apply multiple attention mechanisms, or heads, in parallel. OPT-66B has 64 layers with 72 attention heads in each layer. The output of the multihead attention passes through a separate feed-forward network (FFN) at each layer.

Our first method for analyzing OPT-66B was to assign a score to each attention head and FFN indicating how important they were to a given task. On the basis of those scores, we then pruned the model.

We found that important attention heads are primarily clustered in the model’s intermediate layers, and important FFNs are primarily in later layers. The ability to perform zero-/few-shot in-context learning on 14 different natural-language-processing (NLP) datasets/tasks stayed nearly intact when up to 70% (~15.7B parameters in OPT-66B) of the attention heads are removed.

Attention head heat map.png
A heat map representing attention heads’ aggregate importance scores for five-shot in-context learning across 14 NLP tasks, at each layer of the OPT-66B model.

The attention heads that are important (and unimportant) for in-context learning also seemed to overlap across tasks and shots. This indicates that a common task-agnostic subset of the attention heads is responsible for in-context learning. We also found that up to 20% of the FFNs (~8.5B parameters) can be removed with minimal decline in performance on zero-/few-shot in-context learning.

Our second analytic technique was to quantify the capacity of all attention heads in OPT-66B to perform a pair of task-agnostic primitive operations associated with in-context learning. Those primitives are prefix matching and copying: explicitly searching for a prior occurrence of the current token in context and copying over the token that succeeded it (its suffix).

Suffix copying.png
Prefix matching and copying.

Heads specialized for these two operations were first discovered by the machine learning research company Anthropic and termed induction heads. We found that a small set of heads in OPT-66B have nontrivial scores for both primitives. We also found that these heads overlap (to varying degrees) with the heads important for specific tasks identified earlier. This indicates that induction heads are capable of more sophisticated behaviors associated with in-context learning, such as latent concept matching, but are not the only heads with such capabilities.

Related content
Generative AI raises new challenges in defining, measuring, and mitigating concerns about fairness, toxicity, and intellectual property, among other things. But work has started on the solutions.

Our overarching observation that only a core nucleus of attention heads and FFNs seem to be important for in-context learning indicates that OPT-66B and quite likely other prominent LLMs are undertrained. This also reinforces recent research that questions the efficacy of keeping the amount of pretraining data fixed when scaling models up, suggesting that the amount of pretraining data seen must be scaled hand-in-hand with the models themselves to attain optimal performance. It would be interesting to see how newer variants of LLMs released since the publication of our study, such as those tuned to follow instructions, fare in such analyses.

Related content

IL, Tel Aviv
Come join the AWS Agentic AI science team in building the next generation models for intelligent automation. AWS, the world-leading provider of cloud services, has fostered the creation and growth of countless new businesses, and is a positive force for good. Our customers bring problems that will give Applied Scientists like you endless opportunities to see your research have a positive and immediate impact in the world. You will have the opportunity to partner with technology and business teams to solve real-world problems, have access to virtually endless data and computational resources, and to world-class engineers and developers that can help bring your ideas into the world. As part of the team, we expect that you will develop innovative solutions to hard problems, and publish your findings at peer reviewed conferences and workshops. We are looking for world class researchers with experience in one or more of the following areas - autonomous agents, API orchestration, Planning, large multimodal models (especially vision-language models), reinforcement learning (RL) and sequential decision making.
IL, Tel Aviv
Are you a Masters or PhD student interested in a 2026 Internship in Data Science? If so, we want to hear from you! We are looking for a customer obsessed Data Scientist Intern who can innovate in a business environment and is comfortable owning data to drive step-change innovation in the EMEA region or worldwide. If this describes you, come and join our Data Science teams at Amazon for an exciting internship opportunity. If you are insatiably curious and always want to learn more, then you’ve come to the right place. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science Key job responsibilities As a Data Science Intern, you will have the following key job responsibilities: • Work closely with scientists and engineers to develop new algorithms to implement scientific solutions for Amazon problems • Design, run, and analyze A/B tests • Work on an interdisciplinary team on customer-obsessed research • Experience Amazon's customer-focused culture • Create and deliver projects that can be quickly applied starting locally and scaled to EMEA/worldwide • Create and share data with audiences of varying levels technical papers and presentations • Define metrics and design algorithms to estimate customer satisfaction and engagement A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships or 6-12 months for part time internships. Please note these are not remote internships.
IN, KA, Bengaluru
Alexa+ is the world’s best Generative AI powered personal assistant / agent for consumers. We are seeking an Applied Scientist to join our newly expanding team in India focused on Alexa Conversational Ads and Personalization. In this role, you will build machine learning models that seamlessly and naturally integrate relevant advertising into the Alexa experience while deeply personalizing user interactions. You will work closely with other scientists, engineers, and product managers to take models from conception to production. Key job responsibilities Design, develop, and evaluate innovative deep learning and GenAI models for natural language processing (NLP), recommendation systems, and personalization. Conduct hands-on data analysis and build scalable ML pipelines. Design and run A/B experiments to measure the impact of new models on customer experience and ad performance. Collaborate with software development engineers to deploy models into high-scale, real-time production environments. About the team We are building a new science team in Bangalore to solve some of the most impactful problems in computational advertising. This isn't about tweaking existing models as we are rethinking how ads are ranked, priced, and personalized across voice-first and screen-first surfaces. These are problems that don't have textbook solutions. Key points to note about the team: 🧪 Greenfield team - you are not joining a mature org with rigid processes. You will shape the science roadmap, pick the problems, and define the culture from day one. 📈 Direct business impact — your models directly drive revenue. No yearly cycles to see if your work matters. 🌏 Global scope, local autonomy — collaborate with scientists and engineers across Seattle, Sunnyvale, and Bangalore, but own your problem space end-to-end. 🎓 Ship AND Publish: We encourage top-tier publications (NeurIPS, ACL, EMNLP, KDD, ICML, WWW) while ensuring your research hits production.
IN, KA, Bengaluru
Alexa+ is the world’s best Generative AI powered personal assistant / agent for consumers. We are seeking an Applied Scientist to join our newly expanding team in India focused on Alexa Conversational Ads and Personalization. In this role, you will build machine learning models that seamlessly and naturally integrate relevant advertising into the Alexa experience while deeply personalizing user interactions. You will work closely with other scientists, engineers, and product managers to take models from conception to production. Key job responsibilities Design, develop, and evaluate innovative deep learning and GenAI models for natural language processing (NLP), recommendation systems, and personalization. Conduct hands-on data analysis and build scalable ML pipelines. Design and run A/B experiments to measure the impact of new models on customer experience and ad performance. Collaborate with software development engineers to deploy models into high-scale, real-time production environments. About the team We are building a new science team in Bangalore to solve some of the most impactful problems in computational advertising. This isn't about tweaking existing models as we are rethinking how ads are ranked, priced, and personalized across voice-first and screen-first surfaces. These are problems that don't have textbook solutions. Key points to note about the team: 🧪 Greenfield team - you are not joining a mature org with rigid processes. You will shape the science roadmap, pick the problems, and define the culture from day one. 📈 Direct business impact — your models directly drive revenue. No yearly cycles to see if your work matters. 🌏 Global scope, local autonomy — collaborate with scientists and engineers across Seattle, Sunnyvale, and Bangalore, but own your problem space end-to-end. 🎓 Ship AND Publish: We encourage top-tier publications (NeurIPS, ACL, EMNLP, KDD, ICML, WWW) while ensuring your research hits production.
IN, KA, Bengaluru
Alexa+ is the world’s best Generative AI powered personal assistant / agent for consumers. We are seeking an Applied Scientist to join our newly expanding team in India focused on Alexa Conversational Ads and Personalization. In this role, you will build machine learning models that seamlessly and naturally integrate relevant advertising into the Alexa experience while deeply personalizing user interactions. You will work closely with other scientists, engineers, and product managers to take models from conception to production. Key job responsibilities Design, develop, and evaluate innovative deep learning and GenAI models for natural language processing (NLP), recommendation systems, and personalization. Conduct hands-on data analysis and build scalable ML pipelines. Design and run A/B experiments to measure the impact of new models on customer experience and ad performance. Collaborate with software development engineers to deploy models into high-scale, real-time production environments. About the team We are building a new science team in Bangalore to solve some of the most impactful problems in computational advertising. This isn't about tweaking existing models as we are rethinking how ads are ranked, priced, and personalized across voice-first and screen-first surfaces. These are problems that don't have textbook solutions. Key points to note about the team: 🧪 Greenfield team - you are not joining a mature org with rigid processes. You will shape the science roadmap, pick the problems, and define the culture from day one. 📈 Direct business impact — your models directly drive revenue. No yearly cycles to see if your work matters. 🌏 Global scope, local autonomy — collaborate with scientists and engineers across Seattle, Sunnyvale, and Bangalore, but own your problem space end-to-end. 🎓 Ship AND Publish: We encourage top-tier publications (NeurIPS, ACL, EMNLP, KDD, ICML, WWW) while ensuring your research hits production.
IN, KA, Bengaluru
Alexa+ is the world’s best Generative AI powered personal assistant / agent for consumers. We are seeking an Applied Scientist to join our newly expanding team in India focused on Alexa Conversational Ads and Personalization. In this role, you will build machine learning models that seamlessly and naturally integrate relevant advertising into the Alexa experience while deeply personalizing user interactions. You will work closely with other scientists, engineers, and product managers to take models from conception to production. Key job responsibilities - Design, develop, and evaluate innovative machine learning and deep learning models for natural language processing (NLP), recommendation systems, and personalization. - Conduct hands-on data analysis and build scalable ML pipelines. - Design and run A/B experiments to measure the impact of new models on customer experience and ad performance. - Collaborate with software development engineers to deploy models into high-scale, real-time production environments.
US, CA, San Francisco
Join Amazon's Frontier AI & Robotics team as a Member of Technical Staff, this Technical Program Manager will become the driving force behind breakthrough robotics innovation. You'll orchestrate complex, cross-functional programs that bridge AI research, software, hardware, and production deployment—managing the technical workstreams that enable robots to see, reason, and act in Amazon's warehouse environments. Your program leadership will directly accelerate our mission to build the next generation of embodied intelligence. Key job responsibilities · Establish and drive program management mechanisms and cadence for complex robotics and AI development initiatives spanning research, software engineering, hardware, and operations · Manage end-to-end program execution across the full robotics stack—including AI models, software engineering, and hardware deployment · Drive decision-making velocity by facilitating tradeoff discussions when there are conflicting priorities; determine whether decisions are one-way or two-way doors · Own program-level risk management, proactively identifying technical, schedule, and resource risks; escalate where necessary and drive mitigation strategies · Manage dependencies and scope changes across internal teams and partner organizations, ensuring alignment on commitments, timelines, and technical requirements · Create transparency through clear RACI frameworks, program dashboards, and communication mechanisms that keep stakeholders aligned on status, risks, and decisions · Exercise strong technical judgment to influence program-level decisions on deployment methodology, scalability requirements, and technical feasibility—acting as the voice back to research and engineering teams · Build sustainable program management processes that scale as our organization grows, adapting agile frameworks to the unique challenges of AI robotics A day in the life Your focus centers on driving velocity and alignment across our robotics programs. You might start your morning facilitating tradeoff decisions between AI researchers and software engineers on a critical prototype milestone, then transition to managing dependencies across hardware and operations teams to keep timelines on track. In the afternoon, you could be conducting risk assessments on supply chain constraints that impact our development roadmap, updating program dashboards to provide leadership visibility, or working with partner teams to align on deployment strategies. You'll establish the mechanisms and cadence that keep our fast-moving organization synchronized—from sprint planning rituals to cross-functional design reviews. Throughout the day, you balance hands-on program execution with strategic escalation, ensuring technical decisions align with our long-term vision while removing obstacles that slow teams down. You're the connective tissue that enables researchers, engineers, and operations specialists to move fast together. About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through frontier foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, CA, San Francisco
We are seeking a hands-on Electrical Engineer to lead the design and integration of electrical systems or subsystems for high-degree-of-freedom robotic platforms. This role involves architecting the robot’s power distribution, sensor wiring, and embedded electrical infrastructure. You will be responsible for designing across the full electrical system for advanced robotics platforms including power distribution, sensing, compute, motor controllers, communication infrastructure, battery system and power electronics in close collaboration with mechanical, controls and software engineers. You’ll play a key role in ensuring high-performance, reliable operation of complex electromechanical systems under real-world conditions. Key job responsibilities * Electrical system architect / owner for power electronics, actuation, PCBAs, battery, ware harness specs and high speed electrical/communications protocols * Design, develop and integrate power distribution, embedded electronics, motor controllers and safety-critical circuits for complex robotic systems * Own board layout of PCBAs including SoCs, microcontrollers, sensors, power devices, etc. using Cadence OrCAD/Allegro or equivalent tools. Oversee bring-up and validation * Determine appropriate high speed electrical and communication protocols (e.g., CAN, EtherCAT, USB, etc) for reliable and efficient system operation * Specify and design custom power electronics and power distribution boards to meet performance, thermal, and safety requirements * Design and route all cabling and wire harnesses across the robotic platform, considering EMI, signal integrity, serviceability, and integration with mechanical structures * Architect and integrate the robot’s battery system, including protection circuitry, battery management, charging systems, and thermal considerations * Define and implement wiring and electrical interfaces for sensors (e.g., lidar, stereo cameras, IMUs, tactile) and compute modules * Ownership over prototyping and bringing up electrical designs and creation of test & validation rigs About the team At Frontier AI & Robotics, we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through innovative foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.
US, NY, New York
We are seeking an Applied Scientist to develop and optimize Visual Inertial Odometry (VIO) and sensor fusion systems for our intelligent robots. In this role, you will design, implement, and deploy state estimation and tracking algorithms that enable robots to understand their position and motion in real time, even in challenging and dynamic environments. You will own the full pipeline from algorithm development through embedded deployment, ensuring that perception systems run efficiently on resource-constrained robotic hardware. You will also leverage modern machine learning approaches to push the boundaries of classical perception methods, combining learned representations with geometric techniques to achieve robust, real-time performance. This is a deeply hands-on role. You will work directly with sensors, hardware, and real-world data, while prototyping, testing, and iterating in physical environments. The ideal candidate has strong foundations in VIO and sensor fusion, practical experience optimizing algorithms for embedded platforms, and familiarity with how modern deep learning is transforming perception. Key job responsibilities - Design and implement Visual Inertial Odometry algorithms for robust real-time state estimation on robotic platforms like Sprout - Develop multi-sensor fusion pipelines integrating cameras, IMUs, and other sensing modalities for accurate pose tracking - Optimize perception and tracking algorithms for deployment on embedded hardware (e.g., ARM, GPU-accelerated edge devices) under strict latency and power constraints - Apply modern ML-based perception techniques (learned features, depth estimation, neural odometry) to complement and improve classical geometric approaches - Build and maintain calibration, evaluation, and benchmarking infrastructure for perception systems - Collaborate with hardware, controls, and navigation teams to integrate perception outputs into the robot’s autonomy stack - Lead technical projects from research prototyping through production deployment
US, WA, Bellevue
The candidate in this role will own delivery of science products and solutions to help Amazon Devices Sales and Marketing org. make better decisions: product recommendations to customers, segmentation, financial incrementality of marketing initiatives, A/B testing etc. Key job responsibilities The Amazon Devices organization designs, produces and markets Echo Speakers, Kindle e-readers, Fire Tablets, Fire TV Streaming Media Players, Ring and Blink Smart Home & Security products. We are constantly looking to innovate on behalf of customers with new devices in existing or new categories or improving customer experience on existing platforms. The Devices Data Services (DDS) team provides Data Science, Analytics and Engineering support to the broader organization to enable Sales and Marketing activities across all these product lines. We are looking for an innovative, hands-on and customer-obsessed Data Scientist who can be a strategic partner to the product managers and engineers on the team. Our projects span multiple organizations and require coordination of experimentation, economic and causal analysis, and building predictive machine learning models. A successful candidate will be a problem solver who enjoys diving into data, is excited by difficult modeling challenges, is motivated to build something that will eventually become a production software system, and possesses strong communication skills to effectively interface between technical and business teams. In this role, you will be a technical expert with massive impact. You will take the lead on developing advanced ML systems that are key to reaching our customers with the right recommendations at the right time. Your work will directly impact the success of Amazon's growing Devices business. You will work across diverse science/engineering/business teams. You will work on critical data science problems, building high quality, reliable, accurate, and consistent code sets that are aligned with our business needs. Key Performance Areas - Implement statistical or machine learning methods to solve specific business problems. - Improve upon existing methodologies by developing new data sources, testing model enhancements, and fine-tuning model parameters. - Directly contribute to development of modern automated recommendation systems - Build customer-facing reporting tools to provide insights and metrics to track model performance and explain variance - Collaborate with researchers, software developers, and business leaders to define product requirements, provide analytical support, and communicate feedback A day in the life You will work with other scientists, engineers, product managers, and marketers to develop new products that benefit our customers and help us reach our business goals. You will own solutions from end to end: conceptualization, prioritization, development, delivery, and productionalization. About the team We are a full stack science team that empowers product, marketing, and other business leaders to better understand customers who use Amazon devices, make decisions on product development or optimization, and measure the effectiveness of their efforts against our customer’s expectation. Our focus area is to build analytical frameworks that help the organization either access data, better understand the decisions customers are making and why, or assess customer satisfaction.