Echo Show 10, Charcoal, UI.jpg
A a team of designers, engineers, software developers, and scientists spent many months hypothesizing, experimenting, learning, iterating, and ultimately creating Echo Show 10, which was released Thursday.

The intersection of design and science

How a team of designers, scientists, developers, and engineers worked together to create a truly unique device in Echo Show 10.

During the prototyping stages of the journey that brought Echo Show 10 to life, the design, engineering, and science teams behind it encountered a surprise: one of their early assumptions was proving to be wrong.

The feature that most distinguishes the current generation from its predecessors is the way the device utilizes motion to automatically face users as they move around a room and interact with Alexa. This allows users to move around in the kitchen while consulting a recipe, or to move freely when engaging in a video call, with the screen staying in view.

Naturally, or so the team thought, users would want the device to remain facing them, matching where they were at all times. “You walk from the sink to the fridge, say, while you're using the device for a recipe, the device moves with you,” David Rowell, principal UX designer said. Because no hardware existed, the team had to create a method of prototyping, so they turned to virtual reality (VR). That approach enabled Echo Show 10 teams to work together to test assumptions — including their assumption about how the screen should behave. In this case, what they experienced in VR made them change course.

Echo Show 10 animation

“We had a paradigm that we thought worked really well, but once we tested it, we quickly discovered that we don't want to be one-to-one accurate,” said David Jara, senior UX motion designer. In fact, he said, the feedback led them to a somewhat unexpected conclusion: the device should actually lag behind the user. “Even though, from a pragmatic standpoint, you would think, ‘Well, this thing is too slow. Why can't it keep up?’, once you experienced it, the slowed down version was so much more pleasant.”

This was just one instance of a class of feedback and assumption-changing research that required a team of designers, engineers, software developers, and scientists to constantly iterate and adapt. Those teams spent many months hypothesizing, experimenting, learning, iterating, and ultimately creating Echo Show 10, which was released Thursday. Amazon Science talked to some of those team members to find out how they collaborated to tackle the challenges of developing a motorized smart display and device that pairs sound localization technology and computer vision models.

From idea to iteration

“The idea came from the product team about ways we could differentiate Echo Show,” Rowell said. “The idea came up about this rotating device, but we didn't really know what we wanted to use it for, which is when design came in and started creating use cases for how we could take advantage of motion.”

The design team envisioned a device that moved with users in a way that was both smooth and provided utility.

Adding motion to Echo Show was a really big undertaking. There were a lot of challenges, including how do we make sure that the experience is natural.
Dinesh Nair, applied science manager

That presented some significant challenges for the scientists involved in the project. “Adding motion to Echo Show was a really big undertaking,” said Dinesh Nair, an applied science manager in Emerging Devices. “There were a lot of challenges, including how do we make sure that the experience is natural, and not perceived as creepy by the user.”

Not only did the team have to account for creating a motion experience that felt natural, they had to do it all on a relatively small device. "Building state-of-the-art computer vision algorithms that were processed locally on the device was the greatest challenge we faced," said Varsha Hedau, applied science manager.

The multi-faceted nature of the project also prompted the teams to test the device in a fairly new way. “When the project came along, we decided that that VR would be a great way to actually demonstrate Echo Show 10, particularly with motion,” Rowell noted. “How could it move with you? How does it frame you? How do we fine tune all the ways we want machine learning to move with the correct person?”

Behind each of those questions lay challenges for the design, science, and engineering teams. To identify and address those challenges, the far-flung teams collaborated regularly, even in the midst of a pandemic. “It was interesting because we’re spread over many different locations in the US,” Rowell said. “We had a lot of video calls and VR meant teams could very quickly iterate. There was a lot of sharing and VR was great for that.”

Clearing the hurdles

One of the first hurdles the teams had to clear was how to accurately and consistently locate a person.

“The way we initially thought about doing this was to use spatial cues from your voice to estimate where you are,” Nair said. “Using the direction given by Echo’s chosen beam, the idea was to move the device to face you, and then computer vision algorithms would kick in.”

The science behind Echo Show 10

A combination of audio and visual signals guide the device’s movement, so the screen is always in view. Learn more about the science that empowers that intelligent motion.

That approach presented dual challenges. Current Echo devices form beams in multiple directions and then choose the best beam for speech recognition. “One of the issues with beam selection is that the accuracy is plus or minus 30 degrees for our traditional Echo devices,” Nair observed. “Another is issues with interference noise and sound reflections, for example if you place the device in a corner or there is noise near the person.” The acoustic reflections were particularly vexing since they interfere with the direct sound from the person speaking, especially when the device is playing music. Traditional sound source localization algorithms are also susceptible to these problems.

The Audio Technology team addressed these challenges to determine the direction of sound by developing a new sound localization algorithm. “By breaking down sound waves into their fundamental components and training a model to detect the direct sound, we can accurately determine the direction that sound is coming from,” said Phil Hilmes, director of audio technology. That, along with other algorithm developments, led the team to deliver a sound direction algorithm that was more robust to reflections and interference from noise or music playback, even when it is louder than the person’s voice.

Rowell said, “When we originally conceived of the device, we envisioned it being placed in open space, like a kitchen island so you could use the device effectively from multiple rooms.” Customer feedback during beta testing showed this assumption ran into literal walls. “We found that people actually put the device closer to walls so the device had to work well in these positions.” In some of these more challenging positions, using only audio to find the direction is still insufficient for accurate localization and extra clues from other sensors are needed.

Echo Show 10, Charcoal, Living room.jpg
Echo Show 10 designers initially thought it would be placed in open space, like a kitchen island. Feedback during beta testing showed customers placed it closer to walls, so the teams adjusted.

The design team worked with the science teams so the device relied not just on sound, but also on computer vision. Computer vision algorithms allow the device to locate humans within its field of view, helping it improve accuracy and distinguish people from sounds reflecting off walls, or coming from other sources. The teams also developed fusion algorithms for combining computer vision and sound direction into a model that optimized the final movement.

That collaboration enabled the design team to work with the device engineers to limit the device’s rotation. “That approach prevented the device from turning and basically looking away from you or looking at the wall or never looking at you straight on,” Rowell said. “It really tuned in the algorithms and got better at working out where you were.”

The teams undertook a thorough review of every assumption made in the design phase and adapted based on actual customer interactions. That included the realization that the device’s tracking speed didn’t need to be slow so much as it needed to be intelligent.

“The biggest challenge with Echo Show 10 was to make motion work intelligently,” said Meeta Mishra, principal technical program manager for Echo Devices. “The science behind the device movement is based on fusion of various inputs like sound source, user presence, device placement, and lighting conditions, to name a few. The internal dog-fooding, coupled with the work from home situation, brought forward the real user environment for our testing and iterations. This gave us wider exposure of varied home conditions needed to formulate the right user experience that will work in typical households and also strengthened our science models to make this device a delight.”

Frame rates and bounding boxes

Responding to the user feedback about the preference for intelligent motion meant the science and design teams also had to navigate issues around detection. “Video calls often run at 24 frames a second,” Nair observed. “But a deep learning network that accurately detects where you are, those don't run as fast, they’re typically running at 10 frames per second on the device.”

That latency meant several teams had to find a way to bridge the difference between the frame rates. “We had to work with not just the design team, but also the team that worked on the framing software,” Nair said. “We had to figure out how we could give intermediate results between detections by tracking the person.”

By breaking down sound waves into their fundamental components and training a model ... we can accurately determine the direction that sound is coming from.
Phil Hilmes, director of audio technology

Hedau and her team helped deliver the answer in the form of bounding boxes and Kalman filtering, an algorithm that provides estimates of some unknown variables given the measurements observed over time. That approach allows the device to, essentially, make informed guesses about a user’s movement.

During testing, the teams also discovered that the device would need to account for the manner in which a person interacted with it. “We found that when people are on a call, there are two use cases,” Rowell observed. “They're either are very engaged with the call, where they’re close to the device and looking at the device and the other person on the other end, or they're multitasking.”

The solution was born, yet again, from collaboration. “We went through a lot of experiments to model which user experience really works the best,” Hedau said. Those experiments resulted in utilizing the device’s CV to determine the distance between a person and Echo Show 10.

“We have settings based on the distance that the customer is from the device, which is a way to roughly measure how engaged a customer is,” Rowell said. “When a person is really up close, we don't want the device to move too much because the screen just feels like it's fidgety. But if somebody is on a call and multitasking, they're moving a lot. In this instance, we want smoother transitions.”

Looking to the future

The teams behind the Echo Show 10 are, unsurprisingly, already pondering what’s next. Rowell suggested that, in the future, the Echo Show might show a bit of personality. "We can make the device more playful," Rowell said. "We could start to express a lot of personality with the hardware." [Editor’s note: Some of this is currently enabled via APIs; certain games can “take on new personality through the ability to make the device shake in concert with sound effects and on-screen animations.”]

Nair said his team will also focus on making the on-device processing even faster. “A significant portion of the overall on-device processing is CV and deep learning,” he noted. “Deep networks are always evolving, and we will keep pushing that frontier.”

“Our teams are working continuously to further push the performance of our deep learning models in corner cases such a multi-people, low lighting, fast motions, and more,” added Hedau.

Whatever route Echo Show goes next, the teams behind it already know one thing for certain: they can collaborate their way through just about anything. “With Echo Show 10, there were a lot of assumptions we had when we started, but we didn’t know which would prove true until we got there,” Jara said. “We were kind of building the plane as we were flying it.”

Related content

US, WA, Seattle
The AWS AI Labs team has a world-leading team of researchers and academics, and we are looking for world-class colleagues to join us and make the AI revolution happen. Our team of scientists have developed the algorithms and models that power AWS computer vision services such as Amazon Rekognition and Amazon Textract. As part of the team, we expect that you will develop innovative solutions to hard problems, and publish your findings at peer reviewed conferences and workshops. AWS is the world-leading provider of cloud services, has fostered the creation and growth of countless new businesses, and is a positive force for good. Our customers bring problems which will give Applied Scientists like you endless opportunities to see your research have a positive and immediate impact in the world. You will have the opportunity to partner with technology and business teams to solve real-world problems, have access to virtually endless data and computational resources, and to world-class engineers and developers that can help bring your ideas into the world. Our research themes include, but are not limited to: few-shot learning, transfer learning, unsupervised and semi-supervised methods, active learning and semi-automated data annotation, large scale image and video detection and recognition, face detection and recognition, OCR and scene text recognition, document understanding, 3D scene and layout understanding, and geometric computer vision. For this role, we are looking for scientist who have experience working in the intersection of vision and language. We are located in Seattle, Pasadena, Palo Alto (USA) and in Haifa and Tel Aviv (Israel).
US, WA, Seattle
Amazon Prime Video is changing the way millions of customers enjoy digital content. Prime Video delivers premium content to customers through purchase and rental of movies and TV shows, unlimited on-demand streaming through Amazon Prime subscriptions, add-on channels like Showtime and HBO, and live concerts and sporting events like NFL Thursday Night Football. In total, Prime Video offers nearly 200,000 titles and is available across a wide variety of platforms, including PCs and Macs, Android and iOS mobile devices, Fire Tablets and Fire TV, Smart TVs, game consoles, Blu-ray players, set-top-boxes, and video-enabled Alexa devices. Amazon believes so strongly in the future of video that we've launched our own Amazon Studios to produce original movies and TV shows, many of which have already earned critical acclaim and top awards, including Oscars, Emmys and Golden Globes. The Global Consumer Engagement team within Amazon Prime Video builds product and technology solutions that drive customer activation and engagement across all our supported devices and global footprint. We obsess over finding effective, programmatic and scalable ways to reach customers via a broad portfolio of both in-app and out-of-app experiences. We would love to have you join us to build models that can classify and detect content available on Prime Video. We need you to analyze the video, audio and textual signal streams and improve state-of-art solutions while being scalable to Amazon size data. We need to solve problems across many cultures and languages, working alongside an operations team generating labels across many languages to help us achieve these goals. Our team consistently strives to innovate, and holds several novel patents and inventions in the motion picture and television industry. We are highly motivated to extend the state of the art. As a member of our team, you will apply your deep knowledge of Computer Vision and Machine Learning to concrete problems that have broad cross-organizational, global, and technology impact. Your work will focus on addressing fundamental computer vision models like video understanding and video summarization in addition to building appropriate large scale datasets. You will work on large engineering efforts that solve significantly complex problems facing global customers. You will be trusted to operate with independence and are often assigned to focus on areas with significant impact on audience satisfaction. You must be equally comfortable with digging in to customer requirements as you are drilling into design with development teams and developing production ready learning models. You consistently bring strong, data-driven business and technical judgment to decisions. You will work with internal and external stakeholders, cross-functional partners, and end-users around the world at all levels. Our team makes a big impact because nothing is more important to us than pleasing our customers, continually earning their trust, and thinking long term. You are empowered to bring new technologies and deep learning approaches to your solutions. We embrace the challenges of a fast paced market and evolving technologies, paving the way to universal availability of content. You will be encouraged to see the big picture, be innovative, and positively impact millions of customers. This is a young and evolving business where creativity and drive will have a lasting impact on the way video is enjoyed worldwide.
US, NY, New York
Amazon is looking for an outstanding Data Scientist to help build the next generation of selection systems. On the Specialized Selection team within the Supply Chain Optimization Technologies (SCOT) organization, we own the selection systems that determine which products Amazon offers in our fastest delivery programs. We build state-of-the-art models leveraging tools from machine learning, numerical optimization, natural language processing, and causal inference to automate the management of Amazon's sub-same day (SSD) selection at scale. We sit as a part of one of the largest and most sophisticated supply chains in the world. We operate a highly cross-functional team across software, science, analytics, and product to define and scalably execute the strategic direction of SSD and speed selection more broadly. As a Data Scientist on the team, you will work with scientists, engineers, product managers, and business stakeholders to conduct analyses that reveal key business insights and leverage data science and machine learning techniques to develop new models and solutions to emergent business problems. Key job responsibilities Understanding business problems and translate them to appropriate scientific solutions; Using data to provide new insights and clarity to ambiguous situations; Designing effective, scalable, and achievable solutions to key business problems; Developing the right set of metrics to evaluate efficacy of your models and solutions; Prototyping and analyzing new models and business logic; Communicating, both written and verbally, with both technical and business audiences throughout each project; Contributing to the scientific community across the organization
US, CA, Palo Alto
Join a team working on cutting-edge science to innovate search experiences for Amazon shoppers! Amazon Search helps customers shop with ease, confidence and delight WW. We aim to transform Search from an information retrieval engine to a shopping engine. In this role, you will build models to generate and recommend search queries that can help customers fulfill their shopping missions, reduce search efforts and let them explore and discover new products. You will also build models and applications that will increase customer awareness of related products and product attributes that might be best suited to fulfill the customer needs. Key job responsibilities On a day-to-day basis, you will: Design, develop, and evaluate highly innovative, scalable models and algorithms; Design and execute experiments to determine the impact of your models and algorithms; Work with product and software engineering teams to manage the integration of successful models and algorithms in complex, real-time production systems at very large scale; Share knowledge and research outcomes via internal and external conferences and journal publications; Project manage cross-functional Machine Learning initiatives. About the team The mission of Search Assistance is to improve search feature by reducing customers’ effort to search. We achieve this through three customer-facing features: Autocomplete, Spelling Correction and Related Searches. The core capability behind the three features is backend service Query Recommendation.
US, CA, Palo Alto
Amazon is investing heavily in building a world class advertising business and we are responsible for defining and delivering a collection of self-service performance advertising products that drive discovery and sales. Our products are strategically important to our Retail and Marketplace businesses driving long term growth. We deliver billions of ad impressions and millions of clicks daily and are breaking fresh ground to create world-class products. We are highly motivated, collaborative and fun-loving with an entrepreneurial spirit and bias for action. With a broad mandate to experiment and innovate, we are growing at an unprecedented rate with a seemingly endless range of new opportunities. The Ad Response Prediction team in Sponsored Products organization build advanced deep-learning models, large-scale machine-learning (ML) pipelines, and real-time serving infra to match shoppers’ intent to relevant ads on all devices, for all contexts and in all marketplaces. Through precise estimation of shoppers’ interaction with ads and their long-term value, we aim to drive optimal ads allocation and pricing, and help to deliver a relevant, engaging and delightful ads experience to Amazon shoppers. As the business and the complexity of various new initiatives we take continues to grow, we are looking for energetic, entrepreneurial, and self-driven science leaders to join the team. Key job responsibilities As a Principal Applied Scientist in the team, you will: Seek to understand in depth the Sponsored Products offering at Amazon and identify areas of opportunities to grow our business via principled ML solutions. Mentor and guide the applied scientists in our organization and hold us to a high standard of technical rigor and excellence in ML. Design and lead organization wide ML roadmaps to help our Amazon shoppers have a delightful shopping experience while creating long term value for our sellers. Work with our engineering partners and draw upon your experience to meet latency and other system constraints. Identify untapped, high-risk technical and scientific directions, and simulate new research directions that you will drive to completion and deliver. Be responsible for communicating our ML innovations to the broader internal & external scientific community.
US, CA, Palo Alto
We’re working to improve shopping on Amazon using the conversational capabilities of large language models, and are searching for pioneers who are passionate about technology, innovation, and customer experience, and are ready to make a lasting impact on the industry. You'll be working with talented scientists, engineers, and technical program managers (TPM) to innovate on behalf of our customers. If you're fired up about being part of a dynamic, driven team, then this is your moment to join us on this exciting journey!"?
US, CA, Santa Clara
AWS AI/ML is looking for world class scientists and engineers to join its AI Research and Education group working on foundation models, large-scale representation learning, and distributed learning methods and systems. At AWS AI/ML you will invent, implement, and deploy state of the art machine learning algorithms and systems. You will build prototypes and innovate on new representation learning solutions. You will interact closely with our customers and with the academic and research communities. You will be at the heart of a growing and exciting focus area for AWS and work with other acclaimed engineers and world famous scientists. Large-scale foundation models have been the powerhouse in many of the recent advancements in computer vision, natural language processing, automatic speech recognition, recommendation systems, and time series modeling. Developing such models requires not only skillful modeling in individual modalities, but also understanding of how to synergistically combine them, and how to scale the modeling methods to learn with huge models and on large datasets. Join us to work as an integral part of a team that has diverse experiences in this space. We actively work on these areas: * Hardware-informed efficient model architecture, training objective and curriculum design * Distributed training, accelerated optimization methods * Continual learning, multi-task/meta learning * Reasoning, interactive learning, reinforcement learning * Robustness, privacy, model watermarking * Model compression, distillation, pruning, sparsification, quantization About Us Inclusive Team Culture Here at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Work/Life Balance Our team puts a high value on work-life balance. It isn’t about how many hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We offer flexibility in working hours and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge sharing and mentorship. Our senior members enjoy one-on-one mentoring and thorough, but kind, code reviews. We care about your career growth and strive to assign projects based on what will help each team member develop into a better-rounded engineer and enable them to take on more complex tasks in the future.
US, WA, Seattle
Do you want to join an innovative team of scientists who use machine learning to help Amazon provide the best experience to our Selling Partners by automatically understanding and addressing their challenges, needs and opportunities? Do you want to build advanced algorithmic systems that are powered by state-of-art ML, such as Natural Language Processing, Large Language Models, Deep Learning, Computer Vision and Causal Modeling, to seamlessly engage with Sellers? Are you excited by the prospect of analyzing and modeling terabytes of data and creating cutting edge algorithms to solve real world problems? Do you like to build end-to-end business solutions and directly impact the profitability of the company and experience of our customers? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Selling Partner Experience Science team. Key job responsibilities Use statistical and machine learning techniques to create the next generation of the tools that empower Amazon's Selling Partners to succeed. Design, develop and deploy highly innovative models to interact with Sellers and delight them with solutions. Work closely with teams of scientists and software engineers to drive real-time model implementations and deliver novel and highly impactful features. Establish scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation. Research and implement novel machine learning and statistical approaches. Lead strategic initiatives to employ the most recent advances in ML in a fast-paced, experimental environment. Drive the vision and roadmap for how ML can continually improve Selling Partner experience. About the team Selling Partner Experience Science (SPeXSci) is a growing team of scientists, engineers and product leaders engaged in the research and development of the next generation of ML-driven technology to empower Amazon's Selling Partners to succeed. We draw from many science domains, from Natural Language Processing to Computer Vision to Optimization to Economics, to create solutions that seamlessly and automatically engage with Sellers, solve their problems, and help them grow. Focused on collaboration, innovation and strategic impact, we work closely with other science and technology teams, product and operations organizations, and with senior leadership, to transform the Selling Partner experience.
US, WA, Seattle
The AWS AI Labs team has a world-leading team of researchers and academics, and we are looking for world-class colleagues to join us and make the AI revolution happen. Our team of scientists have developed the algorithms and models that power AWS computer vision services such as Amazon Rekognition and Amazon Textract. As part of the team, we expect that you will develop innovative solutions to hard problems, and publish your findings at peer reviewed conferences and workshops. AWS is the world-leading provider of cloud services, has fostered the creation and growth of countless new businesses, and is a positive force for good. Our customers bring problems which will give Applied Scientists like you endless opportunities to see your research have a positive and immediate impact in the world. You will have the opportunity to partner with technology and business teams to solve real-world problems, have access to virtually endless data and computational resources, and to world-class engineers and developers that can help bring your ideas into the world. Our research themes include, but are not limited to: few-shot learning, transfer learning, unsupervised and semi-supervised methods, active learning and semi-automated data annotation, large scale image and video detection and recognition, face detection and recognition, OCR and scene text recognition, document understanding, 3D scene and layout understanding, and geometric computer vision. For this role, we are looking for scientist who have experience working in the intersection of vision and language. We are located in Seattle, Pasadena, Palo Alto (USA) and in Haifa and Tel Aviv (Israel).
GB, London
Are you excited about applying economic models and methods using large data sets to solve real world business problems? Then join the Economic Decision Science (EDS) team. EDS is an economic science team based in the EU Stores business. The teams goal is to optimize and automate business decision making in the EU business and beyond. An internship at Amazon is an opportunity to work with leading economic researchers on influencing needle-moving business decisions using incomparable datasets and tools. It is an opportunity for PhD students in Economics or related fields. We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to work with large and complicated data sets. Knowledge of econometrics, as well as basic familiarity with Stata, R, or Python is necessary. Experience with SQL would be a plus. As an Economics Intern, you will be working in a fast-paced, cross-disciplinary team of researchers who are pioneers in the field. You will take on complex problems, and work on solutions that either leverage existing academic and industrial research, or utilize your own out-of-the-box pragmatic thinking. In addition to coming up with novel solutions and prototypes, you may even need to deliver these to production in customer facing products. Roughly 85% of previous intern cohorts have converted to full time economics employment at Amazon.