Teaching neural networks to compress images

The combination of a new loss metric and a module that identifies high-importance image regions improves compression.

Virtually all the images flying over the Internet are compressed to save bandwidth, and usually, the codecs — short for coder-decoder — that do the compression, such as JPG, are hand crafted.

In theory, machine-learning-based codecs could provide better compression and higher image quality than hand-crafted codecs. But machine learning models are trained to minimize some loss metric, and existing loss metrics, such as PSNR and MS-SSIM, do not align well with human perception of similarity. 

In January, at the IEEE Winter Conference on Applications of Computer Vision (WACV), we presented a perceptual loss function for learned image compression that addresses this issue. 

Red hat.cropped.png
A comparison of the reconstructed images yielded by seven different compression schemes, both learned and hand crafted, at the same bit rate. Ours provides more faithful reconstruction of image details than the others and compares more favorably with the original (uncompressed) image.

We also describe how to incorporate saliency into a learned codec. Current image codecs, whether classical or learned, tend to compress all regions of an image equally. But most images have salient regions — say, faces and texts — where faithful reconstruction matters more than in other regions — say, sky and background. 

Compression codecs that assign more bits to salient regions than to low-importance regions tend to yield images that human viewers find more satisfying. Our model automatically learns from training data how to trade off the assignment of bits to salient and non-salient regions of an image.

Video of the researchers' conference presentation

In our paper, we also report the results of two evaluation studies. One is a human-perception study in which subjects were asked to compare decompressed images from our codec to those of other codecs. The other study used compressed images in downstream tasks such as object detection and image segmentation.

In the first study, our method was the clear winner at bit rates below one bit per image pixel. In the second study, our method was the top performer across the board.

Model-derived losses

Several studies have shown that the loss functions used to train neural networks as compression codecs are inconsistent with human judgments of quality. For instance, of the four post-compression reconstructions in the image below, humans consistently pick the second from the right as the most faithful, even though it ranks only third according to the MS-SSIM loss metric.

Perception vs. MS-SIM.png
A source image and four post-compression reconstructions of it, ranked, from left to right, in descending order by MS-SSIM values. Human evaluators, however, rank the second-lowest-scoring reconstruction (BPG) as the best.

It’s also been shown, however, that intermediate values computed by neural networks trained on arbitrary computer vision tasks — such as object recognition — accord better with human similarity judgments than conventional loss metrics. 

That is, a neural network trained on a computer vision task will generally produce a fixed-length vector representation of each input image, which is the basis for further processing. The distance between the values of that vector for two different images is a good predictor of human similarity judgments.

Perceptual loss function architecture.png
The architecture of the system we use to compute deep perceptual loss. F is the encoder learned from the image-ranking task. The downstream processing normalizes the encoder outputs and computes the distance between them.

We drew on this observation to create a loss function suitable for training image compression models. In other words, to train our image compression model, we used a loss function computed by another neural network. We call this deep perceptual loss.

First, we created a compression training set using the two-alternative forced-choice (2AFC) methodology. Annotators are presented with two versions of the same image reconstructed from different compression methods (both classical and learned codecs), with the original image between them. They are asked to pick the image that is closer to the original. On average, the annotators spent 56 seconds on each sample.

We split this data into training and test sets and trained a network to predict which of each pair of reconstructed images human annotators preferred. Then we extracted the encoder that produces the vector representation of the input images and used it as the basis for a system that computes a similarity score (above).

eval_metrics_final.png
Our similarity measure approximates human judgment much better than its predecessors, with MS-SIM and PSNR earning the lowest scores.

In the table at right, we can see that, compared to other metrics, our approach (LPIPS-Comp VGG PSNR) provides the closest approximation (81.9) of human judgment (82.06). (The human-judgment score is less than 100 because human annotators sometimes disagree about the relative quality of images.) Also note that MS-SSIM and PSNR loss are the lowest-scoring metrics.

The compression model

Armed with a good perceptual-loss metric, we can train our neural codec. So that it can learn to exploit saliency judgments, our codec includes an off-the-shelf saliency model, trained on a 10,000-image data set in which salient regions have been annotated. The codec learns how to employ the outputs of the saliency model independently, based on the training data.

Compression architecture.png
The architecture of our neural compression codec. The shorter of the two modules labeled bit string is the compressed version of the input. During training, the input is both compressed and decompressed, so that we can evaluate the network according to the similarity between the original and reconstructed images, according to our new loss metric.

In our paper, we report an extensive human-evaluation study that compared our approach to five other compression approaches across four different bits-per-pixel values (0.23, 0.37, 0.67, 1.0). Subjects judged reconstructed images from our model as closest to the original across the three lowest bit-rates. At a bit rate of 1.0 bits per pixel, the BPG method is the top performer.

We did another experiment where we compressed images from the benchmark COCO dataset using traditional and learned image compression approaches. We then used these compressed images for other tasks, such as instance segmentation (finding the boundaries of objects) and object recognition. The reconstructed images from our approach delivered superior performance across the board, since our approach better preserves salient aspects in an image.

A compression algorithm that preserves important aspects of an image at various compression rates benefits Amazon customers in several ways, such as reducing the cost of cloud storage and speeding the download of images stored with Amazon Photos. Delivering those types of concrete results to our customers was the motivation for this work.

Research areas

Related content

US, CA, San Diego
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by preventing eCommerce fraud? Are you excited by the prospect of analyzing and modeling terabytes of data and creating state-of-the-art algorithms to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you enjoy collaborating in a diverse team environment? If yes, then you may be a great fit to join the Amazon Buyer Risk Prevention (BRP) Machine Learning group. We are looking for a talented scientist who is passionate to build advanced algorithmic systems that help manage safety of millions of transactions every day. Key job responsibilities Use machine learning and statistical techniques to create scalable risk management systems Learning and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations, Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches
GB, Cambridge
The Artificial General Intelligence team (AGI) has an exciting position for an Applied Scientist with a strong background NLP and Large Language Models to help us develop state-of-the-art conversational systems. As part of this team, you will collaborate with talented scientists and software engineers to enable conversational assistants capabilities to support the use of external tools and sources of information, and develop novel reasoning capabilities to revolutionise the user experience for millions of Alexa customers. Key job responsibilities As an Applied Scientist, you will develop innovative solutions to complex problems to extend the functionalities of conversational assistants . You will use your technical expertise to research and implement novel algorithms and modelling solutions in collaboration with other scientists and engineers. You will analyse customer behaviours and define metrics to enable the identification of actionable insights and measure improvements in customer experience. You will communicate results and insights to both technical and non-technical audiences through written reports, presentations and external publications.
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a highly skilled and experienced Senior Applied Scientist, to lead the development and implementation of cutting-edge algorithms and models for supervised fine-tuning and reinforcement learning through human feedback; with a focus across text, image, and video modalities. As a Senior Applied Scientist, you will play a critical role in driving the development of Generative AI (GenAI) technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities - Collaborate with cross-functional teams of engineers, product managers, and scientists to identify and solve complex problems in GenAI - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think big about the arc of development of GenAI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports - Mentor and guide junior scientists and engineers, and contribute to the overall growth and development of the team
US, WA, Bellevue
Amazon’s Last Mile Team is looking for a passionate individual with strong optimization and analytical skills to join its Last Mile Science team in the endeavor of designing and improving the most complex planning of delivery network in the world. Last Mile builds global solutions that enable Amazon to attract an elastic supply of drivers, companies, and assets needed to deliver Amazon's and other shippers' volumes at the lowest cost and with the best customer delivery experience. Last Mile Science team owns the core decision models in the space of jurisdiction planning, delivery channel and modes network design, capacity planning for on the road and at delivery stations, routing inputs estimation and optimization. Our research has direct impact on customer experience, driver and station associate experience, Delivery Service Partner (DSP)’s success and the sustainable growth of Amazon. Optimizing the last mile delivery requires deep understanding of transportation, supply chain management, pricing strategies and forecasting. Only through innovative and strategic thinking, we will make the right capital investments in technology, assets and infrastructures that allows for long-term success. Our team members have an opportunity to be on the forefront of supply chain thought leadership by working on some of the most difficult problems in the industry with some of the best product managers, scientists, and software engineers in the industry. Key job responsibilities Candidates will be responsible for developing solutions to better manage and optimize delivery capacity in the last mile network. The successful candidate should have solid research experience in one or more technical areas of Operations Research or Machine Learning. These positions will focus on identifying and analyzing opportunities to improve existing algorithms and also on optimizing the system policies across the management of external delivery service providers and internal planning strategies. They require superior logical thinkers who are able to quickly approach large ambiguous problems, turn high-level business requirements into mathematical models, identify the right solution approach, and contribute to the software development for production systems. To support their proposals, candidates should be able to independently mine and analyze data, and be able to use any necessary programming and statistical analysis software to do so. Successful candidates must thrive in fast-paced environments, which encourage collaborative and creative problem solving, be able to measure and estimate risks, constructively critique peer research, and align research focuses with the Amazon's strategic needs.
IN, KA, Bangalore
Alexa is the voice activated digital assistant powering devices like Amazon Echo, Echo Dot, Echo Show, and Fire TV, which are at the forefront of this latest technology wave. To preserve our customers’ experience and trust, the Alexa Sensitive Content Intelligence (ASCI) team creates policies and builds services and tools through Machine Learning techniques to detect and mitigate sensitive content across Alexa. We are looking for an experienced Applied Science Manager to lead a team to build industry-leading technologies in attribute extraction and sensitive content detection across all languages and countries. A Manager, Applied Science will be a tech leader for a team of exceptional scientists to develop novel algorithms and modeling techniques to advance the state of the art in NLP or CV related tasks. You will work in a hybrid, fast-paced organization where scientists, engineers, and product managers work together to build customer facing experiences. You will collaborate with and mentor other scientists to raise the bar of scientific research in Amazon. Your work will directly impact our customers in the form of products and services that make use of speech, language, and computer vision technologies. We are looking for a leader with strong technical experiences a passion for building scientific driven solutions in a fast-paced environment. You should have good understanding of NLP models (e.g. LSTM, transformer based models) or CV models (e.g. CNN, AlexNet, ResNet) and where to apply them in different business cases. You leverage your exceptional technical expertise, a sound understanding of the fundamentals of Computer Science, and practical experience of building large-scale distributed systems to creating reliable, scalable, and high-performance products. In addition to technical depth, you must possess exceptional communication skills and understand how to influence key stakeholders. You will be joining a select group of people making history producing one of the most highly rated products in Amazon's history, so if you are looking for a challenging and innovative role where you can solve important problems while growing as a leader, this may be the place for you. Key job responsibilities You'll lead and manage the science driven solution development including design, run experiments, research new algorithms, and find new ways of optimizing customer experience. You set examples for the team on good science practice and standards. Besides theoretical analysis and innovation, you will work closely with talented engineers and ML scientists to put your algorithms and models into practice. Your work will directly impact the trust customers place in Alexa, globally. You contribute directly to our growth by hiring smart and motivated Scientists to establish teams that can deliver swiftly and predictably, adjusting in an agile fashion to deliver what our customers need. A day in the life You will be working with a group of talented scientists as well as stakeholder from different functional areas (e.g. product, engineering) on researching algorithm and running experiments to test scientific proposal/solutions to improve our sensitive contents detection and mitigation. This will involve collaboration with partner teams including engineering, PMs, data annotators, and other scientists to discuss data quality, policy, and model development. You will mentor other scientists, review and guide their work, help develop roadmaps for the team. You work closely with partner teams across Alexa to deliver platform features that require cross-team leadership. About the team The mission of the Alexa Sensitive Content Intelligence (ASCI) team is to (1) minimize negative surprises to customers caused by sensitive content, (2) detect and prevent potential brand-damaging interactions, and (3) build customer trust through appropriate interactions on sensitive topics. The term “sensitive content” includes within its scope a wide range of categories of content such as offensive content (e.g., hate speech, racist speech), profanity, content that is suitable only for certain age groups, politically polarizing content, and religiously polarizing content. The term “content” refers to any material that is exposed to customers by Alexa (including both 1P and 3P experiences) and includes text, speech, audio, and video.
US, WA, Seattle
Are you passionate about leveraging data to deliver actionable insights that impact daily marketing activities at Amazon? The Customer Targeting team in Amazon is seeking an Applied Scientist to join our team to develop models for optimizing the performance of Amazon’s marketing initiatives across channels and advertising formats. You have experience applying modern machine learning methods to answer key business questions, make strategic and tactical recommendations for change, and work with business leaders to drive these to production. You are entrepreneurial and able to work in a highly collaborative environment. This role requires an individual with strong quantitative modeling skills and experience using statistical methods. The successful candidate will be a self-starter comfortable with ambiguity, with strong attention to detail, an ability to work in a fast-paced and ever-changing environment. You will be expected to: - Leverage knowledge of statistics and optimization to frame decision-making problems for determining marketing spends across channels. - Predict future customer behavior and business conditions through machine learning and predictive modeling. - Use analytical and predictive techniques to build models for optimizing targeting. - Present proposals and results in a clear manner backed by data and coupled with actionable conclusions.
US, CA, Pasadena
The Amazon Web Services (AWS) Center for Quantum Computing (CQC) is a multi-disciplinary team of scientists, engineers, and technicians, on a mission to develop a fault-tolerant quantum computer. We are looking to hire a Research Scientist with fabrication and data analysis experience working on all elements of a superconducting circuit. The position is on-site at our lab, located on the in Pasadena, CA. The ideal candidate will have had prior experience building software tools for data analysis and visualization to enable deep diving into fabrication details, electrical test data. We are looking for candidates with strong engineering principles, resourcefulness and data science experience. Organization and communication skills are essential. Key job responsibilities * Develop and automate data pipeline pertinent to superconducting device fabrication. * Develop analytical tools to uncover new information about established and new processes. * Develop new or contribute to modifying existing data visualization tools. * Utilize machine learning to enable better deeper dives into fabrication and related data. * Interface with various software, design, fabrication and electrical test teams to enable new functionalities. A day in the life The role will be vital to the fabrication team and quantum computing device integration mechanism. The candidate will develop software based analytical tools to enable data driven decisions across projects related to fabrication and supporting infrastructure. Each fabrication run delivers additional data. The candidate will stay close to the details of fabrication providing data analysis and quick feedback to key stakeholders. At the end of fabrication runs custom and standardized reports will be generated by the candidate to provide insights into data generated from the run. This position may require occasional weekend work. About the team AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.
US, WA, Bellevue
Conversational AI ModEling and Learning (CAMEL) team is part of Amazon Devices organization where our mission is to build a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the model and which enable utilizing thousands of APIs and external knowledge sources to provide the best experience for each request across millions of customers and endpoints. We are looking for a passionate, talented, and resourceful Senior Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware conversational AI. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in building Generative AI solutions with LLMs, enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, moving fast to ship solutions and then iterating on user feedback and interactions. Key job responsibilities As a Senior Applied Scientist, you will leverage your technical expertise and experience to demonstrate leadership in tackling large complex problems, setting the direction and collaborating with other talented applied scientists and engineers to research and develop LLM modeling and engineering techniques to reduce friction and enable natural and contextual conversations. You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. You will work on core LLM technologies, including Prompt Engineering, Model Fine-Tuning, Reinforcement Learning from Human Feedback (RLHF), Evaluation, etc. Your work will directly impact our customers in the form of novel products and services .
US, WA, Seattle
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by preventing eCommerce fraud? Are you excited by the prospect of analyzing and modeling terabytes of data and creating state-of-the-art algorithms to solve real world problems? Do you like to own end-to-end business problems/metrics and directly impact the profitability of the company? Do you enjoy collaborating in a diverse team environment? If yes, then you may be a great fit to join the Amazon Buyer Risk Prevention (BRP) Machine Learning group. We are looking for a talented scientist who is passionate to build advanced algorithmic systems that help manage safety of millions of transactions every day. Key job responsibilities Use machine learning and statistical techniques to create scalable risk management systems Learning and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations, Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches
US, CA, Palo Alto
We are reimagining Amazon Search by introducing an interactive conversational experience that makes finding the perfect product easier than ever. With our state-of-the-art Large Language Model (LLM) innovations, you can now ask product-related questions, compare products, receive personalized suggestions, and more—all through a fast and reliable natural language conversation. This is just the beginning of a new era in online shopping, and the future is yours to shape. We're searching for pioneers who are passionate about technology, innovation, and customer experience, and who are ready to make a lasting impact on the industry. You'll be working with talented scientists, and engineers to innovate on behalf of our customers. If you're fired up about being part of a dynamic, driven team, then this is your moment to join us on this exciting journey and change the world of eCommerce forever!