3 questions with Jeremy Holleman: How to design and develop ultra-low-power AI processors
Holleman, the chief scientist of Alexa Fund company Syntiant, explains why the company’s new architecture allows machine learning to be deployed practically anywhere.
Editor’s Note: This article is the latest installment within a series Amazon Science is publishing related to the science behind products and services from companies in which Amazon has invested. Syntiant, founded in 2017, has shipped more than 10 million units to customers worldwide, and has obtained $65 million in funding from leading technology companies, including the Amazon Alexa Fund.
In late July, Amazon held its Alexa Live event, where the company introduced more than 50 features to help developers and device makers build ambient voice-computing experiences, and drive the growth of voice computing.
The event included an Amazon Alexa Startups Showcase in which Syntiant, a semiconductor company founded in 2017, and based in Irvine, California, shared its vision for making voice the computing interface of the future.
In 2017, Kurt Busch, Syntiant’s chief executive officer, and Jeremy Holleman, Syntiant’s chief scientist, and a professor of electrical and computer engineering at the University of North Carolina at Charlotte, were focused on finding an answer to the question: How do you optimize the performance of machine learning models on power- and cost-constrained hardware?
According to Syntiant, they — and other members of Syntiant’s veteran management team — had the idea for a processor architecture that could deliver 200 times the efficiency, 20 times the performance, and at half the cost of existing edge processors. One key to their approach — optimizing for memory access versus traditional processors’ focus on logic.
This insight, and others, led them to the formation of Syntiant, which for the past four years has been designing and developing ultra-low-power, high-performance, deep neural network processors for computing at the network’s edge, helping to reduce latency, and increase the privacy and security of power- and cost-constrained applications running on devices as small as earbuds, and as large as automobiles.
Syntiant’s processors enable always-on voice (AOV) control for most battery-powered devices, from cell phones and earbuds, to drones, laptops and other voice-activated products. The company’s Neural Decision Processors (NDPs) provide highly accurate wake word, command word and event detection in a tiny package with near-zero power consumption.
Holleman is considered a leading authority on ultra-low-power integrated circuits, and directs the Integrated Silicon Systems Laboratory at the University of North Carolina, Charlotte, where he is an associate professor. He’s also is a coauthor of the book “Ultra Low-Power Integrated Circuit Design for Wireless Neural Interfaces”, which was first published in 2011.
Amazon Science asked Holleman three questions about the challenges of designing and developing ultra-low-power AI processors, and why he believes voice will become the predominant user interface of the future.
Q. You are one of 22 authors on a paper, "MLPerf Tiny Benchmark", which has been accepted to the NeurIPS 2021 Conference. What does this benchmark suite comprise, and why is it significant to the tinyML field?
The MLPerf Tiny Benchmark actually includes four tests meant to measure the performance and efficiency of very small devices on ML inference: keyword spotting, person detection, image recognition, and anomaly detection. For each test, there is a reference model, and code to measure the latency and power on a reference platform.
I try to think about the benchmark from the standpoint of a system developer – someone building a device that needs some local intelligence. They have to figure out, with a given energy budget and system requirements, what solution is going to work for them. So they need to understand the power consumption and speed of different hardware. When you look at most of the information available, everyone measures their hardware on different things, so it’s really hard to compare. The benchmark makes it clear exactly what is being measured and – in the closed division – every submission is running the exact same model, so it’s a clear apples-to-apples comparison.
Then the open division takes the same principle – every submission does the same thing – but allows for some different tradeoffs by just defining the problem and allowing submitters to run different models that may take advantage of particular aspects of their hardware. So you wind up with a Pareto surface of accuracy, power, and speed. I think this last part is particularly important in the “tiny” space because there is a lot of room to jointly optimize models, hardware, and features to get high-performing and high-efficiency end-to-end systems.
Q. What do you consider Syntiant’s key ingredients in your development and design of ultra-low-power AI processors, and how will your team’s work contribute to voice becoming the predominant user interface of the future?
I would say there are two major elements that have been key to our success. The first is, as I mentioned before, that edge ML requires tight coupling between the hardware and the algorithms. From the very beginning at Syntiant, we’ve had our silicon designers and our modelers working closely together. That shows up in office arrangement, with hardware and software groups all intermingled; in code and design reviews, really all across the company. And I think that’s paid off in outcomes. We see how easy it is to map a given algorithm to our hardware, because the hardware was designed to do all the hard work of coordinating memory access in a way that’s optimized for exactly the types of computation we see in ML workloads. And for the same reason, we see the benefits of that approach in power and performance.
The second big piece is that we realized that deep learning is still such a new field that the expertise required to deliver production-grade solutions is still very rare. It’s easy enough to download an MNIST or CIFAR demo, train it up and you think, “I’ve got this figured out!” But when you deploy a device to millions of people who interact with it on a daily basis, the job becomes much harder. You need to acquire data, validate it, debug models, and it’s a big job. We knew that for most customers, we couldn’t just toss a piece of silicon over the fence and leave the rest to them. That led us to put a lot of effort into building a complete pipeline addressing the data tasks, training, and evaluation, so we can provide a complete solution to customers who don’t have a ton of ML expertise in house.
Q. What in particular makes edge processing difficult?
On the hardware side, the big challenges are power and cost. Whether you’re talking about a watch, an earbud, or a phone, consumers have some pretty hard requirements for how long a battery needs to last – generally a day – and how much they will pay for something. And on the modeling side, edge devices find themselves in a tremendously diverse set of environments, so you need a voice assistant to recognize you not just in the kitchen or in the car, but on a factory floor, at a football game, and everywhere else you can imagine going.
Then those three things push against each other like the classical balloon analogy. If you push down cost by choosing a lower-end processor, it may not have the throughput to run the model quickly, so you run at a lower frame rate, under-sampling the input signal, and you miss events. Or you find a model that works well, and you run it fast enough, but then the power required to run it limits battery life. This tradeoff is especially difficult for features that are always on, like a wakeword detector, or person detection in a security camera. At Syntiant, we had to address all of these issues simultaneously, which is why it was so important to have all of our teams tightly connected, work through the use cases, and know how each piece affected all the other pieces.
Conventional general-purpose processors don’t have the efficiency to run strong models within the constraints that edge devices have. With our new architecture, powerful machine learning can be deployed practically anywhere for the first time.
Having done that work, the result is that you get the power of modern ML in tiny devices with almost no impact on the battery life. And the possibilities, especially for voice interfaces, is very exciting. We’ve all grown accustomed to interacting with our phone by voice and we’ve seen how often we want to do something but don’t have a free hand available for a tactile interface.
Syntiant’s technology is making it possible to bring that experience to smaller and cheaper devices with all of the processing happening locally. So many of the devices we use have useful information they can’t share with us because the interface would be too expensive. Imagine being able to say “TV remote, where are you?” or “Smoke alarm, why are you beeping?” and getting a clear and quick answer. We’ve forgotten that some annoying things we’ve gotten so used to can be fixed. And of course you don’t want all of the cost and the privacy concerns associated with sending all of that information to the cloud.
So we’re focused on putting that level of intelligence right in the device. To deliver that, we need all of these pieces to come together: the data pipeline, the models, and the hardware. Conventional general-purpose processors don’t have the efficiency to run strong models within the constraints that edge devices have. With our new architecture, powerful machine learning can be deployed practically anywhere for the first time.