Shrinking machine learning models for offline use

"Perfect hashing" is among the techniques that reduce the memory footprints of machine learning models by 94%.

Last week, the Alexa Auto team announced the release of its new Alexa Auto Software Development Kit (SDK), enabling developers to bring Alexa functionality to in-vehicle infotainment systems.

SYNC 3 and Amazon Echo
Ford is working to link home automation devices like Amazon Echo and Wink with its vehicles through Ford SYNC®, allowing consumers to control lights, thermostats and other home systems from their car and interact with their vehicle, including starting and unlocking it, from their home.

The initial release of the SDK assumes that automotive systems will have access to the cloud, where the machine-learning models that power Alexa currently reside. But in the future, we would like Alexa-enabled vehicles — and other mobile devices — to have recourse to some core functions even when they’re offline. That will mean drastically reducing the size of the underlying machine-learning models, so they can fit in local memory.

At the same time, third-party developers have created more than 45,000 Alexa skills, which expand on Alexa’s native capabilities, and that number is increasing daily. Even in the cloud, third-party skills are loaded into memory only when explicitly invoked by a customer request. Shrinking the underlying models would reduce load time, ensuring that Alexa customers continue to experience millisecond response times.

At this year’s Interspeech, my colleagues and I will present a new technique for compressing machine-learning models that reduces their memory footprints by 94% while leaving their performance almost unchanged. We report our results in a paper titled “Statistical model compression for small-footprint natural language understanding.”

Quantization

Alexa’s natural-language-understanding systems, which interpret free-form utterances, use several different types of machine-learning (ML) models, but they all share some common traits. One is that they learn to extract “features” — or strings of text with particular predictive value — from input utterances. An ML model trained to handle music requests, for instance, will probably become sensitized to text strings like “the Beatles”, “Elton John”, “Whitney Houston”, “Adele”, and so on. Alexa’s ML models frequently have millions of features.

Another common trait is that each feature has a set of associated “weights,” which determine how large a role it should play in different types of computation. The need to store multiple weights for millions of features is what makes ML models so memory intensive.

Our first technique for compressing an ML model is to quantize its weights. We take the total range of weights — say, -100 to 100 — and divide it into even intervals — say, -100 to -90, -90 to -80, and so on. Then we simply round each weight off to the nearest boundary value for its interval. In practice, we use 256 intervals, which allows us to represent every weight in the model with a single byte of data, with minimal effect on the network’s accuracy. This approach has the added benefit of automatically rounding low weights to zero, so they can be discarded.

Perfect hashing

Our other compression technique is more elegant. If an Alexa customer says, “Alexa, play ‘Yesterday,’ by the Beatles,” we want our system to pull up the weights associated with the feature “the Beatles” — not the weights associated with “Adele”, “Elton John”, and the rest. This requires a means of mapping particular features to the memory locations of the corresponding weights.

The standard way to perform such mappings is through hashing. A hash function is a mathematical function that takes arbitrary inputs and scrambles them up — hashes them — in such a way that the outputs (1) are of fixed size and (2) bear no predictable relationship to the inputs. If the output size is fixed at 16 bits, for instance, there are 65,536 possible hash values, but “Hank Williams” might map to value 1, while “Hank Williams, Jr.” maps to value 65,000.

Nonetheless, traditional hash functions sometimes produce collisions: Hank Williams, Jr. may not map to the same location as Hank Williams, but something totally arbitrary — the Bay City Rollers, say — might. In terms of runtime performance, this usually isn’t a big problem. If you hash the name “Hank Williams” and find two different sets of weights at the corresponding memory location, it doesn’t take that long to consult a metadata tag to determine which set of weights belongs to which artist.

In terms of memory footprint, however, this approach to collision resolution makes a substantial difference. With quantizing, the weights themselves will require just a few bytes of data; the metadata used to distinguish sets of weights could end up requiring more space in memory than the data it’s tagging.

We address this problem by using a more advanced hashing technique called perfect hashing, which maps a specific number of data items to the same number of memory slots but guarantees there will be no collisions. With perfect hashing, the system can simply hash a string of characters and pull up the corresponding weights — no metadata required.

Perfect-hashing algorithm
Our perfect-hashing algorithm relies on a family of conventional hash functions (h1, h2, etc.). If a function in the family produces a collision-free hash, we toggle the corresponding 0 in an array to 1. Then we repeat the process with different functions and smaller arrays, until every input value has a unique hash.

To produce a perfect hash, we assume that we have access to a family of conventional hash functions all of which produce random hashes. That is, each function in the family might hash “Hank Williams” to a different value, but that value tells you nothing about how the same function will hash any other string. In practice, we use the hash function MurmurHash, which can be seeded with a succession of different values.

Suppose that you have N input strings that you want to hash. We begin with an array of N 0’s. Then we apply our first hash function — call it Hash1 — to all N inputs. For every string that yields a unique hash value — no collisions — we change the corresponding 0 in the array to a 1.

Then we build a new array of 0’s, with entries for only the input strings that yielded collisions under Hash1. To those strings, we now apply a different hash function — say, Hash2 — and we again toggle the 0’s corresponding to collision-free hashes.

We repeat this process until every input string has a corresponding 1 in some array. Then we combine all the arrays into one giant array. The position of a 1 in the giant array indicates the unique memory location assigned to the corresponding input string.

Now, when the trained network receives an input, it applies Hash1 to each of the input’s substrings and, if it finds a 1 in the first array, it goes to the associated address. If it finds a 0, it applies Hash2 and repeats the process.

Calling successive hash functions for some inputs does incur a slight performance penalty. But it’s a penalty that’s paid only where a conventional hash function would yield a collision, anyway. In our paper, we include both a theoretical analysis and experimental results that demonstrate that this penalty is almost negligible. And it’s certainly a small price to pay for the drastic reduction in memory footprint that the method affords.

Acknowledgments: Kanthashree Mysore Sathyendra, Stanislav Peshterliev

Research areas

Related content

US, NY, New York
We are seeking an Applied Scientist to develop and optimize Visual Inertial Odometry (VIO) and sensor fusion systems for our intelligent robots. In this role, you will design, implement, and deploy state estimation and tracking algorithms that enable robots to understand their position and motion in real time, even in challenging and dynamic environments. You will own the full pipeline from algorithm development through embedded deployment, ensuring that perception systems run efficiently on resource-constrained robotic hardware. You will also leverage modern machine learning approaches to push the boundaries of classical perception methods, combining learned representations with geometric techniques to achieve robust, real-time performance. This is a deeply hands-on role. You will work directly with sensors, hardware, and real-world data, while prototyping, testing, and iterating in physical environments. The ideal candidate has strong foundations in VIO and sensor fusion, practical experience optimizing algorithms for embedded platforms, and familiarity with how modern deep learning is transforming perception. Key job responsibilities - Design and implement Visual Inertial Odometry algorithms for robust real-time state estimation on robotic platforms like Sprout - Develop multi-sensor fusion pipelines integrating cameras, IMUs, and other sensing modalities for accurate pose tracking - Optimize perception and tracking algorithms for deployment on embedded hardware (e.g., ARM, GPU-accelerated edge devices) under strict latency and power constraints - Apply modern ML-based perception techniques (learned features, depth estimation, neural odometry) to complement and improve classical geometric approaches - Build and maintain calibration, evaluation, and benchmarking infrastructure for perception systems - Collaborate with hardware, controls, and navigation teams to integrate perception outputs into the robot’s autonomy stack - Lead technical projects from research prototyping through production deployment
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the limits. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. As an Applied Scientist on our team, you will focus on building state-of-the-art ML models for biology. Our team rewards curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the forefront of both academic and applied research in this product area, you have the opportunity to work together with a diverse and talented team of scientists, engineers, and product managers and collaborate with other teams. Key job responsibilities - Build, adapt and evaluate ML models for life sciences applications - Collaborate with a cross-functional team of ML scientists, biologists, software engineers and product managers
US, MA, Boston
MULTIPLE POSITIONS AVAILABLE Employer: AMAZON.COM SERVICES LLC Offered Position: Economist III Job Location: Boston, Massachusetts Job Number: AMZ9898444 Position Responsibilities: Mentor and guide the applied scientists and economists in our organization and hold us to a high standard of technical rigor and excellence in science. Design and lead roadmaps for complex science projects to help SP have a delightful selling experience while creating long term value for our shoppers. Work with our engineering partners and draw upon your experience to meet latency and other system constraints. Identify untapped, high-risk technical and scientific directions, and simulate new research directions that you will drive to completion and deliver. Be responsible for communicating our science innovations to the broader internal & external scientific community. Position Requirements: Ph.D. or foreign equivalent degree in Economics or a related field and two years of research or work experience in the job offered or a related occupation. Must have two years of research or work experience in the following skill(s): 1) experience in econometrics including experience with program evaluation, forecasting, time series, panel data, or high dimensional problems; 2) experience with economic theory and quantitative methods; and 3) coding in a scripting language such as R, Python, or similar. Amazon.com is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation. 40 hours / week, 8:00am-5:00pm, Salary Range $159,200/year to $215,300/year. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, visit: https://www.aboutamazon.com/workplace/employee-benefits.#0000
US, WA, Seattle
Applied Scientists in AWS Automated Reasoning are dedicated to making AWS the best computing service in the world for customers who require advanced and rigorous solutions for automated reasoning, privacy, and sovereignty. Key job responsibilities - Solve large or significantly complex problems that require deep knowledge and understanding of your domain and scientific innovation. - Own strategic problem solving, and take the lead on the design, implementation, and delivery for solutions that have a long-term quantifiable impact. - Provide cross-organizational technical influence, increasing productivity and effectiveness by sharing your deep knowledge and experience. - Develop strategic plans to identify fundamentally new solutions for business problems. - Assist in the career development of others, actively mentoring individuals and the community on advanced technical issues.
US, CA, San Francisco
Amazon is on a mission to redefine the future of automation — and we're looking for exceptional talent to help lead the way. We are building the next generation of advanced robotic systems that seamlessly blend cutting-edge AI, sophisticated control systems, and novel mechanical design to create adaptable, intelligent automation solutions capable of operating safely alongside humans in dynamic, real-world environments. At Amazon, we leverage the power of machine learning, artificial intelligence, and advanced robotics to solve some of the most complex operational challenges at a scale unlike anywhere else in the world. Our fleet of robots spans hundreds of facilities globally, working in sophisticated coordination to deliver on our promise of customer excellence — and we're just getting started. As a Sr. Scientist in Robot Navigation, you will be at the forefront of this transformation — architecting and delivering navigation systems that are intelligent, safe, and scalable. You will bring deep expertise in learning-based planning and control, a strong understanding of foundation models and their application to embodied agents, and as well as have in-depth understanding of control-theoretic approaches such as model predictive control (MPC)-based trajectory planning. You will develop navigation solutions that seamlessly blend data-driven intelligence with principled control-theoretic guarantees. Our vision is bold: to build navigation systems that allow robots to move fluidly and safely through dynamic environments — understanding context, anticipating change, and adapting in real time. You will lead research that bridges the gap between cutting-edge academic advances and production grade deployment, collaborating with world-class teams pushing the boundaries of robotic autonomy, manipulation, and human-robot interaction. Join us in building the next generation of intelligent navigation systems that will define the future of autonomous robotics at scale. Key job responsibilities - Design, develop, and deploy perception algorithms for robotics systems, including object detection, segmentation, tracking, depth estimation, and scene understanding - Lead research initiatives in computer vision, sensor fusion and 3D perception - Collaborate with cross-functional teams including robotics engineers, software engineers, and product managers to define and deliver perception capabilities - Drive end-to-end ownership of ML models — from data collection and labeling strategy to training, evaluation, and deployment - Mentor junior scientists and engineers; contribute to a culture of technical excellence - Define and track key metrics to measure perception system performance in real-world environments - Publish research findings in top-tier venues (CVPR, ICCV, ECCV, ICRA, NeurIPS, etc.) and contribute to patents A day in the life - Train ML models for deployment in simulation and real-world robots, identify and document their limitations post-deployment - Drive technical discussions within your team and with key stakeholders to develop innovative solutions to address identified limitations - Actively contribute to brainstorming sessions on adjacent topics, bringing fresh perspectives that help peers grow and succeed — and in doing so, build lasting trust across the team - Mentor team members while maintaining significant hands-on contribution to technical solutions About the team Our team is a group is a diverse group of scientists and engineers passionate about building intelligent machines. We value curiosity, rigor, and a bias for action. We believe in learning from failure and iterating quickly toward solutions that matter.
US, NY, New York
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment and advertising? Prime Video's technology teams are creating best-in-class digital video experiences, and our Advertising Product & Technology organization is at the forefront of revolutionizing the streaming advertising landscape. The Prime Video Advertising team delivers ad tech solutions that power Prime Video's rapidly growing advertising business across video-on-demand (VOD), live streaming, and display ads—delivering value to both advertisers and viewers worldwide. We focus on critical areas including ad delivery, machine learning-driven optimization, experimentation, audience measurement, and generative AI-powered ad creative solutions. We are seeking a Senior Manager, Applied Science to lead a team of scientists and engineers building machine learning and AI solutions that directly impact Prime Video's advertising business. In this role, you will own the science strategy and execution for key workstreams including: - Ad Load Optimization – Balancing advertising revenue with viewer engagement through sophisticated ML models that determine optimal ad frequency, placement, and duration - Yield Optimization – Maximizing advertising revenue through intelligent allocation, pricing, and forecasting models - Experimentation & Metrics – Designing and scaling experimentation frameworks and causal inference methods to measure the impact of advertising decisions on both business outcomes and customer experience - Ad Creative Generation & Augmentation – Leveraging generative AI to create, personalize, and enhance ad creatives at scale As a leader of leaders, you will set the 3-5 year scientific vision for your organization, build and develop a high-performing team of senior scientists and managers, and drive large-scale ML/AI initiatives that inform strategic decisions for one of the world's largest streaming advertising platforms. You will collaborate closely with engineering, product, and business teams to translate complex scientific capabilities into measurable business impact during a period of rapid growth with a path to $10B in advertising revenue. This role offers the unique opportunity to shape the science strategy for a new and fast-growing business, working at the intersection of machine learning, generative AI, causal inference, and advertising technology at Internet scale.
US, WA, Seattle
Applied Scientists in AWS Automated Reasoning are dedicated to making AWS the best computing service in the world for customers who require advanced and rigorous solutions for automated reasoning, privacy, and sovereignty. Key job responsibilities The successful candidate will: - Solve large or significantly complex problems that require deep knowledge and understanding of your domain and scientific innovation. - Own strategic problem solving, and take the lead on the design, implementation, and delivery for solutions that have a long-term quantifiable impact. - Provide cross-organizational technical influence, increasing productivity and effectiveness by sharing your deep knowledge and experience. - Develop strategic plans to identify fundamentally new solutions for business problems. - Assist in the career development of others, actively mentoring individuals and the community on advanced technical issues. A day in the life This is a unique and rare opportunity to get in early on a fast-growing segment of AWS and help shape the technology, product and the business. You will have a chance to utilize your deep technical experience within a fast moving, start-up environment and make a large business and customer impact. About the team Diverse Experiences Amazon Automated Reasoning values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn't followed a traditional path, or includes alternative experiences, don't let it stop you from applying. Why Amazon Automated Reasoning? At Amazon, automated reasoning is central to maintaining customer trust and delivering delightful customer experiences. Our organization is responsible for creating and maintaining a high bar for automated reasoning across all of Amazon's products and services. We offer talented automated reasoning professionals the chance to accelerate their careers with opportunities to build experience in a wide variety of areas including cloud, devices, retail, entertainment, healthcare, operations, and physical stores. Inclusive Team Culture In Amazon Automated Reasoning, it's in our nature to learn and be curious. Ongoing DEI events and learning experiences inspire us to continue learning and to embrace our uniqueness. Addressing the toughest automated reasoning challenges requires that we seek out and celebrate a diversity of ideas, perspectives, and voices. Training & Career Growth We're continuously raising our performance bar as we strive to become Earth's Best Employer. That's why you'll find endless knowledge-sharing, training, and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there's nothing we can't achieve.
US, WA, Seattle
Applied Scientists in AWS Automated Reasoning are dedicated to making AWS the best computing service in the world for customers who require advanced and rigorous solutions for automated reasoning, privacy, and sovereignty. Key job responsibilities The successful candidate will: - Solve large or significantly complex problems that require deep knowledge and understanding of your domain and scientific innovation. - Own strategic problem solving, and take the lead on the design, implementation, and delivery for solutions that have a long-term quantifiable impact. - Provide cross-organizational technical influence, increasing productivity and effectiveness by sharing your deep knowledge and experience. - Develop strategic plans to identify fundamentally new solutions for business problems. - Assist in the career development of others, actively mentoring individuals and the community on advanced technical issues. A day in the life This is a unique and rare opportunity to get in early on a fast-growing segment of AWS and help shape the technology, product and the business. You will have a chance to utilize your deep technical experience within a fast moving, start-up environment and make a large business and customer impact. About the team Diverse Experiences Amazon Automated Reasoning values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn't followed a traditional path, or includes alternative experiences, don't let it stop you from applying. Why Amazon Automated Reasoning? At Amazon, automated reasoning is central to maintaining customer trust and delivering delightful customer experiences. Our organization is responsible for creating and maintaining a high bar for automated reasoning across all of Amazon's products and services. We offer talented automated reasoning professionals the chance to accelerate their careers with opportunities to build experience in a wide variety of areas including cloud, devices, retail, entertainment, healthcare, operations, and physical stores. Inclusive Team Culture In Amazon Automated Reasoning, it's in our nature to learn and be curious. Ongoing DEI events and learning experiences inspire us to continue learning and to embrace our uniqueness. Addressing the toughest automated reasoning challenges requires that we seek out and celebrate a diversity of ideas, perspectives, and voices. Training & Career Growth We're continuously raising our performance bar as we strive to become Earth's Best Employer. That's why you'll find endless knowledge-sharing, training, and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there's nothing we can't achieve.
US, WA, Bellevue
The Amazon Fulfillment Technologies (AFT) Science team is seeking an exceptional Applied Scientist with strong operations research and optimization expertise to develop production solutions for one of the most complex systems in the world: Amazon's Fulfillment Network. At AFT Science, we design, build, and deploy optimization, statistics, machine learning, and GenAI/LLM solutions that power production systems running across Amazon Fulfillment Centers worldwide. We tackle a wide range of challenges throughout the network, including labor planning and staffing, pick scheduling, stow guidance, and capacity risk management. Our mission is to develop innovative, scalable, and reliable science-driven production solutions that exceed the published state of the art, enabling systems to run optimally and continuously (from every few minutes to every few hours) across our large-scale network. Key job responsibilities As an Applied Scientist, you will collaborate with scientists, software engineers, product managers, and operations leaders to develop optimization-driven solutions that directly impact process efficiency and associate experience in the fulfillment network. Your key responsibilities include: - Develop deep understanding and domain knowledge of operational processes, system architecture, and business requirements - Dive deep into data and code to identify opportunities for continuous improvement and disruptive new approaches - Design and develop scalable mathematical models for production systems to derive optimal or near-optimal solutions for existing and emerging challenges - Create prototypes and simulations for agile experimentation of proposed solutions - Advocate for technical solutions with business stakeholders, engineering teams, and senior leadership - Partner with software engineers to integrate prototypes into production systems - Design and execute experiments to test new or incremental solutions launched in production - Build and monitor metrics to track solution performance and business impact About the team Amazon Fulfillment Technology (AFT) designs, develops, and operates end-to-end fulfillment technology solutions for all Amazon Fulfillment Centers (FCs). We harmonize the physical and virtual worlds so Amazon customers can get what they want, when they want it. The AFT Science team brings expertise in operations research, optimization, statistics, machine learning, and GenAI/LLM, combined with deep domain knowledge of operational processes within FCs and their unique challenges. We prioritize advancements that support AFT tech teams and focus areas rather than specific fields of research or individual business partners. We influence each stage of innovation from inception to deployment, which includes both developing novel solutions and improving existing approaches. Our production systems rely on a diverse set of technologies, and our teams invest in multiple specialties as the needs of each focus area evolve.
US, NY, New York
The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. About the team SPB Agent team's vision is to build a highly personalized and context-aware agentic advertiser guidance system that seamlessly integrates Large Language Models (LLMs) with sophisticated tooling, operating across all experiences. The SPB-Agent is the central agent that interfaces with advertisers across Ads Console, Selling Partner portals (Seller Central, KDP, Vendor Central), and internal Sales systems. We identify high-impact opportunities spanning from strategic product guidance to granular optimization and deliver them through personalized, scalable experiences grounded in state-of-the-art agent architectures, reasoning frameworks, sophisticated tool integration, and model customization approaches including fine-tuning, MCP, and preference optimization. This presents an exceptional opportunity to shape the future of e-commerce advertising through advanced AI technology at unprecedented scale, creating solutions that directly impact millions of advertisers.