Shrinking machine learning models for offline use

"Perfect hashing" is among the techniques that reduce the memory footprints of machine learning models by 94%.

Last week, the Alexa Auto team announced the release of its new Alexa Auto Software Development Kit (SDK), enabling developers to bring Alexa functionality to in-vehicle infotainment systems.

SYNC 3 and Amazon Echo
Ford is working to link home automation devices like Amazon Echo and Wink with its vehicles through Ford SYNC®, allowing consumers to control lights, thermostats and other home systems from their car and interact with their vehicle, including starting and unlocking it, from their home.

The initial release of the SDK assumes that automotive systems will have access to the cloud, where the machine-learning models that power Alexa currently reside. But in the future, we would like Alexa-enabled vehicles — and other mobile devices — to have recourse to some core functions even when they’re offline. That will mean drastically reducing the size of the underlying machine-learning models, so they can fit in local memory.

At the same time, third-party developers have created more than 45,000 Alexa skills, which expand on Alexa’s native capabilities, and that number is increasing daily. Even in the cloud, third-party skills are loaded into memory only when explicitly invoked by a customer request. Shrinking the underlying models would reduce load time, ensuring that Alexa customers continue to experience millisecond response times.

At this year’s Interspeech, my colleagues and I will present a new technique for compressing machine-learning models that reduces their memory footprints by 94% while leaving their performance almost unchanged. We report our results in a paper titled “Statistical model compression for small-footprint natural language understanding.”

Quantization

Alexa’s natural-language-understanding systems, which interpret free-form utterances, use several different types of machine-learning (ML) models, but they all share some common traits. One is that they learn to extract “features” — or strings of text with particular predictive value — from input utterances. An ML model trained to handle music requests, for instance, will probably become sensitized to text strings like “the Beatles”, “Elton John”, “Whitney Houston”, “Adele”, and so on. Alexa’s ML models frequently have millions of features.

Another common trait is that each feature has a set of associated “weights,” which determine how large a role it should play in different types of computation. The need to store multiple weights for millions of features is what makes ML models so memory intensive.

Our first technique for compressing an ML model is to quantize its weights. We take the total range of weights — say, -100 to 100 — and divide it into even intervals — say, -100 to -90, -90 to -80, and so on. Then we simply round each weight off to the nearest boundary value for its interval. In practice, we use 256 intervals, which allows us to represent every weight in the model with a single byte of data, with minimal effect on the network’s accuracy. This approach has the added benefit of automatically rounding low weights to zero, so they can be discarded.

Perfect hashing

Our other compression technique is more elegant. If an Alexa customer says, “Alexa, play ‘Yesterday,’ by the Beatles,” we want our system to pull up the weights associated with the feature “the Beatles” — not the weights associated with “Adele”, “Elton John”, and the rest. This requires a means of mapping particular features to the memory locations of the corresponding weights.

The standard way to perform such mappings is through hashing. A hash function is a mathematical function that takes arbitrary inputs and scrambles them up — hashes them — in such a way that the outputs (1) are of fixed size and (2) bear no predictable relationship to the inputs. If the output size is fixed at 16 bits, for instance, there are 65,536 possible hash values, but “Hank Williams” might map to value 1, while “Hank Williams, Jr.” maps to value 65,000.

Nonetheless, traditional hash functions sometimes produce collisions: Hank Williams, Jr. may not map to the same location as Hank Williams, but something totally arbitrary — the Bay City Rollers, say — might. In terms of runtime performance, this usually isn’t a big problem. If you hash the name “Hank Williams” and find two different sets of weights at the corresponding memory location, it doesn’t take that long to consult a metadata tag to determine which set of weights belongs to which artist.

In terms of memory footprint, however, this approach to collision resolution makes a substantial difference. With quantizing, the weights themselves will require just a few bytes of data; the metadata used to distinguish sets of weights could end up requiring more space in memory than the data it’s tagging.

We address this problem by using a more advanced hashing technique called perfect hashing, which maps a specific number of data items to the same number of memory slots but guarantees there will be no collisions. With perfect hashing, the system can simply hash a string of characters and pull up the corresponding weights — no metadata required.

Perfect-hashing algorithm
Our perfect-hashing algorithm relies on a family of conventional hash functions (h1, h2, etc.). If a function in the family produces a collision-free hash, we toggle the corresponding 0 in an array to 1. Then we repeat the process with different functions and smaller arrays, until every input value has a unique hash.

To produce a perfect hash, we assume that we have access to a family of conventional hash functions all of which produce random hashes. That is, each function in the family might hash “Hank Williams” to a different value, but that value tells you nothing about how the same function will hash any other string. In practice, we use the hash function MurmurHash, which can be seeded with a succession of different values.

Suppose that you have N input strings that you want to hash. We begin with an array of N 0’s. Then we apply our first hash function — call it Hash1 — to all N inputs. For every string that yields a unique hash value — no collisions — we change the corresponding 0 in the array to a 1.

Then we build a new array of 0’s, with entries for only the input strings that yielded collisions under Hash1. To those strings, we now apply a different hash function — say, Hash2 — and we again toggle the 0’s corresponding to collision-free hashes.

We repeat this process until every input string has a corresponding 1 in some array. Then we combine all the arrays into one giant array. The position of a 1 in the giant array indicates the unique memory location assigned to the corresponding input string.

Now, when the trained network receives an input, it applies Hash1 to each of the input’s substrings and, if it finds a 1 in the first array, it goes to the associated address. If it finds a 0, it applies Hash2 and repeats the process.

Calling successive hash functions for some inputs does incur a slight performance penalty. But it’s a penalty that’s paid only where a conventional hash function would yield a collision, anyway. In our paper, we include both a theoretical analysis and experimental results that demonstrate that this penalty is almost negligible. And it’s certainly a small price to pay for the drastic reduction in memory footprint that the method affords.

Acknowledgments: Kanthashree Mysore Sathyendra, Stanislav Peshterliev

Research areas

Related content

US, CA, Santa Clara
The Geospatial science team solves problems at the interface of ML/AI and GIS for Amazon's last mile delivery programs. We have access to Earth-scale datasets and use them to solve challenging problems that affect hundreds of thousands of transporters. We are looking for strong candidates to join the transportation science team which owns time estimation, GPS trajectory learning, and sensor fusion from phone data. You will join a team of GIS and ML domain experts and be expected to develop ML models, present research results to stakeholders, and collaborate with SDEs to implement the models in production. Key job responsibilities - Understand business problems and translate them into science problems - Develop ML models - Present research results - Write and publish papers - Collaborate with other scientists
US, CA, San Francisco
If you are interested in this position, please apply on Twitch's Career site https://www.twitch.tv/jobs/en/ About Us: Twitch is the world’s biggest live streaming service, with global communities built around gaming, entertainment, music, sports, cooking, and more. It is where thousands of communities come together for whatever, every day. We’re about community, inside and out. You’ll find coworkers who are eager to team up, collaborate, and crush (or elegantly solve) problems together. We’re on a quest to empower live communities, so if this sounds good to you, see what we’re up to on LinkedIn and X, and discover the projects we’re solving on our Blog. Be sure to explore our Interviewing Guide to learn how to ace our interview process. About the Role: We are looking for an experienced Data Scientist to support our central analytics and finance disciplines at Twitch. Bringing to bear a mixture of data analysis, dashboarding, and SQL query skills, you will use data-driven methods to answer business questions, and deliver insights that deepen understanding of our viewer behavior and monetization performance. Reporting to the Head of Finance, Analytics, and Business Operations, your team will be located in San Francisco. While there is a preference for the San Francisco Bay Area, we are open to this role operating remotely within the U.S. You Will: - Create actionable insights from data related to Twitch viewers, creators, advertising revenue, commerce revenue, and content deals. - Develop dashboards and visualizations to communicate points of view that inform business decision-making. - Create and maintain complex queries and data pipelines for ad-hoc analyses. - Author narratives and documentation that support conclusions. - Collaborate effectively with business partners, product managers, and data team members to align data science efforts with strategic goals.
US, WA, Seattle
The Private Brands Discovery team designs innovative machine learning solutions to enhance customer awareness of Amazon’s own brands and help customers find products they love. This interdisciplinary team of scientists and engineers incubates and develops disruptive solutions using cutting-edge technology to tackle some of the most challenging scientific problems at Amazon. To achieve this, the team utilizes methods from Natural Language Processing, deep learning, large language models (LLMs), multi-armed bandits, reinforcement learning, Bayesian optimization, causal and statistical inference, and econometrics to drive discovery throughout the customer journey. Our solutions are crucial to the success of Amazon’s private brands and serve as a model for discovery solutions across the company. This role presents a high-visibility opportunity for someone eager to make a business impact, delve into large-scale problems, drive measurable actions, and collaborate closely with scientists and engineers. As a team lead, you will be responsible for developing and coaching talent, guiding the team in designing and developing cutting-edge models, and working with business, marketing, and software teams to address key challenges. These challenges include building and improving models for sourcing, relevance, and CTR/CVR estimation, deploying reinforcement learning methods in production etc. In this role, you will be a technical leader in applied science research with substantial scope, impact, and visibility. A successful team lead will be an analytical problem solver who enjoys exploring data, leading problem-solving efforts, guiding the development of new frameworks, and engaging in investigations and algorithm development. You should be capable of effectively interfacing between technical teams and business stakeholders, pushing the boundaries of what is scientifically possible, and maintaining a sharp focus on measurable customer and business impact. Additionally, you will mentor and guide scientists to enhance the team's talent and expand the impact of your work.
US, MD, Annapolis Junction
Are you excited to help the US Intelligence Community design, build, and implement AI algorithms to augment decision making while meeting the highest standards for reliability, transparency, and scalability? The Amazon Web Services (AWS) US Federal Professional Services team works directly with US Intelligence Community agencies and other public sector entities to achieve their mission goals through the adoption of Machine Learning (ML) methods. We build models for text, image, video, audio, and multi-modal use cases, using traditional or generative approaches to fit the mission. Our team collaborates across the entire AWS organization to bring access to product and service teams, to get the right solution delivered and drive feature innovation based on customer needs. At AWS, we're hiring experienced data scientists with a background in both traditional and generative AI who can help our customers understand the opportunities their data presents, and build solutions that earn the customer trust needed for deployment to production systems. In this role, you will work closely with customers to deeply understand their data challenges and requirements, and design tailored solutions that best fit their use cases. You should have broad experience building models using all kinds of data sources, and building data-intensive applications at scale. You should possess excellent business acumen and communication skills to collaborate effectively with stakeholders, develop key business questions, and translate requirements into actionable solutions. You will provide guidance and support to other engineers, sharing industry best practices and driving innovation in the field of data science and AI. This position may require local travel up to 25% It is expected to work from one of the above locations (or customer sites) at least 1+ days in a week. This is not a remote position. You are expected to be in the office or with customers as needed. This position requires that the candidate selected must currently possess and maintain an active TS/SCI Security Clearance with Polygraph. The position further requires the candidate to opt into a commensurate clearance for each government agency for which they perform AWS work. Key job responsibilities As an Data Scientist, you will: - Collaborate with AI/ML scientists and architects to research, design, develop, and evaluate cutting-edge AI algorithms to address real-world challenges - Interact with customers directly to understand the business problem, help and aid them in implementation of AI solutions, deliver briefing and deep dive sessions to customers and guide customer on adoption patterns and paths to production. - Create and deliver best practice recommendations, tutorials, blog posts, sample code, and presentations adapted to technical, business, and executive stakeholder - Provide customer and market feedback to Product and Engineering teams to help define product direction About the team About AWS Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
US, VA, Arlington
Are you looking to work at the forefront of Machine Learning and AI? Would you be excited to apply cutting edge Generative AI algorithms to solve real world problems with significant impact? Amazon Web Services (AWS) Professional Services (ProServe) is looking for Data Scientists who like helping U.S. Federal agencies implement innovative cloud computing solutions and solve technical problems using state-of-the-art language models in the cloud. AWS ProServe engages in a wide variety of projects for customers and partners, providing collective experience from across the AWS customer base and are obsessed about strong success for the Customer. Our team collaborates across the entire AWS organization to bring access to product and service teams, to get the right solution delivered and drive feature innovation based upon customer needs. At AWS, we're hiring experienced data scientists with a background in NLP, generative AI, and document processing to help our customers understand, plan, and implement best practices around leveraging these technologies within their AWS cloud environments. Our consultants deliver proof-of-concept projects, reusable artifacts, reference architectures, and lead implementation projects to assist organizations in harnessing the power of their data and unlocking the potential of advanced NLP and AI capabilities. In this role, you will work closely with customers to deeply understand their data challenges and requirements, and design tailored solutions that best fit their use cases. You should have deep expertise in NLP/NLU, generative AI, and building data-intensive applications at scale. You should possess excellent business acumen and communication skills to collaborate effectively with stakeholders, develop key business questions, and translate requirements into actionable solutions. You will provide guidance and support to other engineers, sharing industry best practices and driving innovation in the field of data science and AI. It is expected to work from one of the above locations (or customer sites) at least 1+ days in a week. This is not a remote position. You are expected to be in the office or with customers as needed. This position requires that the candidate selected be a US Citizen and obtain and maintain a security clearance at the TS/SCI with polygraph level. Upon start, the selected candidate will be sponsored for a commensurate clearance for each government agency for which they perform AWS work. Key job responsibilities In this role, you will: - Collaborate with AI/ML scientists and architects to research, design, develop, and evaluate cutting-edge generative AI solutions to address real-world challenges. - Interact with customers directly to understand the business problem, help and aid them in implementation of generative AI solutions, deliver briefing and deep dive sessions to customers and guide customer on adoption patterns and paths to production. - Provide expertise and guidance in generative AI and document processing infrastructure, design, implementation, and optimization. - Maintain domain knowledge and expertise in generative AI, NLP, and NLU. - Architect and build large-scale solutions. - Build technical solutions that are secure, maintainable, scalable, reliable, performant, and cost-effective. - Identify and prepare metrics and reports for the internal team and for customers to delineate the value of their solution to the customer. - Identify, mitigate and communicate risks related to solution and service constraints by making technical trade-offs. - Participate in growing their team’s skills and help mentor internal and customer team members. - Provide guidance on the people, organizational, security and compliance aspects of AI/ML transformations for the customer. About the team Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional.
US, WA, Seattle
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Applied Scientist with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Applied Scientist with the AGI team, you will work with talented peers to lead the development of novel algorithms and modeling techniques, to advance the state of the art with LLMs. Your work will directly impact our customers in the form of products and services that make use of speech and language technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in spoken language understanding. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, Sunnyvale
Amazon's AGI Web & Knowledge Services group is seeking a passionate, talented, and inventive Applied Scientist to lead the development of industry-leading structured Information retrieval systems. As part of our cutting-edge AGI-SIR team, you will play a pivotal role in developing efficient AI solutions for Knowledge Graphs, Graph Search and Question Answering Systems. In this role, your work will focus on creating scalable and efficient AI-driven technologies that push the boundaries of information retrieval. You will work on a broad range of problems, from low-level data processing to the development of novel retrieval models, leveraging state-of-the-art machine learning methods. Key job responsibilities - Lead the development of advanced algorithms for knowledge graphs, graph search and question answering systems, guiding the team in solving complex problems and setting technical direction. - Design models that address customer needs, making informed trade-offs to balance accuracy, efficiency, and user experience. - Collaborate with engineering teams to implement successful models into scalable, reliable Amazon production systems. - Present results to technical and business audiences, ensuring clarity, statistical rigor, and relevance to business goals. - Establish and uphold high scientific and engineering standards, driving best practices across the team. - Promote a culture of experimentation and continuous learning within Amazon’s applied science community.
US, WA, Seattle
Join an innovative team of scientists and engineers who use machine learning and statistical techniques to create state-of-the-art solutions for providing better value to Amazon's customers. Key job responsibilities “Amazon Science gives you insight into the company’s approach to customer-obsessed scientific innovation. Amazon fundamentally believes that scientific innovation is essential to being the most customer-centric company in the world. It’s the company’s ability to have an impact at scale that allows us to attract some of the brightest minds in artificial intelligence and related fields. Our scientists continue to publish, teach, and engage with the academic community, in addition to utilizing our working backwards method to enrich the way we live and work.” Please visit https://www.amazon.science for more information Do you want to join an innovative team of scientists and engineers who use machine learning and statistical techniques to create state-of-the-art solutions for providing better value to Amazon's customers? Do you want to build advanced algorithmic systems that help optimize millions of transactions every day? Are you excited by the prospect of analyzing and modeling terabytes of data to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Machine Learning team for our International Consumer Businesses. The team builds the next generation of Machine Learning solutions for a wide spectrum of problems by leveraging generative-AI and LLMs, in areas such as Recommendations, Search Relevance, Catalog Quality, Online Ads, Pricing, Demand/Forecasting, Computer Vision, and Conversational Systems. A day in the life Build advanced algorithmic systems that help optimize millions of transactions every day. About the team The team builds the next generation of Machine Learning services for a wide spectrum of problems in areas such as Recommendations, Search Relevance, Computer Vision, Catalog Quality, Online Ads, Pricing, Demand/Forecasting, and Conversational Systems.
GB, Cambridge
We are looking for a researcher in cutting-edge LLM technologies for applications across Alexa, AWS, and other Amazon businesses. In this role, you will innovate in the fastest-moving fields of current AI research, in particular in how to integrate a broad range of structured and unstructured information into AI systems (e.g. with RAG techniques), and get to immediately apply your results in highly visible Amazon products. If you are deeply familiar with LLMs, natural language processing, and machine learning and have experience managing high-performing research teams, this may be the right opportunity for you. Our fast-paced environment requires a high degree of independence in making decisions and driving ambitious research agendas all the way to production. You will work with other science and engineering teams as well as business stakeholders to maximize velocity and impact of your team's contributions. It's an exciting time to be a leader in AI research. In Amazon's AGI Information team, you can make your mark by improving information-driven experience of Amazon customers worldwide!
CA, BC, Vancouver
Alexa Daily Essentials is hiring an Applied Scientist to research and implement large language model innovations to enhance Alexa's language understanding, knowledge representation, reasoning and generation capabilities. The Alexa Daily Essentials team delivers experiences critical to how customers interact with Alexa as part of daily life. We drive over 40 billion+ actions annually across 60 million+ monthly customers, who engage with our products across experiences connected to Timers, Alarms, Calendars, Food, and News. Our experiences include critical time saving techniques, ad-supported news audio and video, and in-depth kitchen guidance aimed at serving the needs of the family from sunset to sundown. Our upcoming launches are at the forefront of innovation, delivering step-function improvements in experiences that stretch across the customer journey, and new AI technologies that will enable customers to send Alexa information for future recall and conversation. We collaborate closely with partners such as Amazon.com, Whole Foods, Spotify, CNN, Fox, NPR, BBC, Discovery, and Food Network to deliver our vision. If you are passionate about redefining the personal assistant experience and leveraging innovative technology to improve daily life, we’d love to hear from you. This is an opportunity to make a tangible impact at the heart of the Alexa ecosystem. As an applied scientist, you will advance state of the art techniques in ML and LLM, and work closely with product and engineering teams to build the next generation of the Alexa smart assistant. Key job responsibilities - Rapidly prototype ML/LLM solutions, evaluate feasibility, and drive projects to production deployment - Continuously monitor and improve model performance through retraining, parameter tuning, and architecture refinements - Develop new training and inference techniques to improve model performance - Work cross-functionally across engineering, product, and business teams to understand customer needs, scope science work, and drive science solutions from conception to customer delivery - Research and develop LLM innovations, and lead paper publications. - Code proficiently in Python (required) and Java (preferred); optimize systems for high performance at scale; contribute code directly into production services - Innovate and develop science and engineering solutions that optimize team operations and increase team effectiveness. - Clearly communicate complex technical concepts to non-technical stakeholders and leadership