How a passion for reinforcement learning guided Alexander Long’s trajectory
The field motivated him to pursue a PhD, which eventually led him to Amazon.
Alexander Long had his mind set on working in the oil and gas industry, following in his father’s footsteps. The sector is a big employer of electrical engineers in his home country of Australia, so it was a natural path after getting his bachelor’s degree at The University of Queensland (UQ).
In 2013, as Long was preparing to graduate, he became the first student selected for a collaboration between UQ and the Technical University of Munich (TUM). He spent two years in Germany, completing simultaneous master’s degrees in electrical engineering — both at UQ and at TUM. That’s when he heard about reinforcement learning (RL) for the first time — and he quickly realized he wanted to go deeper.
“Reinforcement learning is one way to frame the problem of making optimal actions,” Long explained. “Chess is a good example of a situation where you have an objective — winning the game — and you have to take a bunch of sequential steps to meet that objective. But you don’t get any concrete feedback until after you’ve made 20 or 30 moves.” The same framework can be used to solve a multitude of problems, from winning a game to optimizing a refinery or controlling a nuclear fusion reactor.
The widespread applications for reinforcement learning fascinated Long. But, he notes, the method has some significant drawbacks. “One of those is you need huge amounts of interactions with an environment before you can learn how to act well,” he explained.
After completing his master’s program, Long pursued a PhD in computer science at the University of New South Wales (UNSW). He wanted to explore the challenge of how to help RL models become more data efficient by learning from fewer interactions.
The outcome was “Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation”, a paper that was presented as part of an AAAI 2022 poster session.
It was very surprising; the algorithm was on par with all the best methods in terms of data efficiency, but it was about 100 times faster in terms of computation time.
The paper notes that previous advances in RL algorithm efficiency “have been achieved at the cost of increased sample, and computational complexity.” That added complexity “presents a major roadblock” for online, real-world settings. In their paper, the researchers presented “Nonparametric Approximation of Inter-Trace returns (NAIT), an algorithm that is both computation and sample efficient.”
“I was poking around that area, doing baseline work, and I found there was a very basic method that could be modernized by adding a couple of innovations, but nothing crazy, and that it worked extremely well,” he says. “It was very surprising; the algorithm was on par with all the best methods in terms of data efficiency, but it was about 100 times faster in terms of computation time.”
His drive to find solutions wasn’t limited to reinforcement learning either. Long also had an entrepreneurial experience during his PhD, when he co-founded a start-up called Sigeion. He used a term of leave to participate in an accelerator program by venture capital firm Antler.
“Their approach is to take individuals, merge them together, and hope that companies come out of it,” he says. “Their logic is, if we get 80 good people, maybe we get three good companies that we can invest in. So, they make it this little hunger-games, eight-week competition. It was quite intense and very high-pressure but pretty fun.”
Long and his cofounder worked on applying reinforcement learning to supply chain challenges. “One application of reinforcement learning is optimizing inventory levels and orders,” he said. “Currently this is solved in a very rudimentary fashion in many industries.” In the end, Long and his cofounder were among eight companies to receive funding, but he decided to continue pursuing his PhD.
When Long saw that Amazon was opening an office in Australia in 2021, he focused his energies on getting a job there. He did that by contacting his future boss, Anton van den Hengel, director of applied science at Amazon.
“I emailed him three times, pestering him for a job,” he recalled. Eventually he gained an interview for an internship. His first interview didn’t lead to a role, but his second did.
As an intern, Long worked on two different projects related to product listings in the Amazon Store. The first involved the fact that while customers can see characteristics of products from relevant images, the actual data related to those attributes — such as size, color or style — is sometimes missing or incomplete. Filling in this data after the fact had proven to be challenging due to, among other things, the scale to which such a system must be applied.
In previous machine learning systems, images had to be labeled, or have some categorical value associated with them.
“Recent work shows you can actually use freeform text, as long as it's natural language, pass it through a text encoder, train it with some joint objective and you have a measure of similarity between that text and whatever is in the image,” Long said. “We showed that you can use this to go back and fill in these attributes with just one single model. That’s significant because, previously, people were making models for each attribute.”
That led to a second project: attempting to combine the best properties of the existing single-attribute models and the broad, pretrained approach of his previous project in order to address the problem of long-tailed classification. In this scenario, some data is labeled, but most categories contain only a few examples.
So Long and his fellow researchers proposed a new method, one that was presented in the paper, “Retrieval augmented classification for long-tail visual recognition,” which was accepted by the Conference on Computer Vision and Pattern Recognition (CVPR).
The paper introduces Retrieval Augmented Classification (RAC) which, applied to the problem of long-tail classification, shows “a significant improvement over previous state-of-the-art … despite using only the training datasets themselves as the external information source.”
“When you don’t have much training data for a class, doing retrieval is better. But when you do have a lot of training data, classical supervised learning is better. One way to think about RAC is that it’s just a way to use both, although it unlocks a few other capabilities as well,” Long said.
At the end of his internship, Long went through a set of interviews and presented the work he had done over that period to help secure a full-time position as an applied scientist. Van den Hengel said the decision to hire Long was easy. “He has great skills, and a strong publication record. More than that though, he demonstrated the ability to apply and extend the state of the art in ML research. That’s what we’re seeking.”
I was told to set my own direction, work at my own pace, and let’s see what you do at the end of six months. The other exceptional thing about the internship was hanging out with some of the smartest people.
Looking back on his internship, Long said his startup experience led him to assume a big company like Amazon meant he wouldn’t have as much freedom and would be told exactly what to do.
“It was not like that at all,” he noted. “I was told to set my own direction, work at my own pace, and let’s see what you do at the end of six months.”
“The other exceptional thing about the internship was hanging out with some of the smartest people,” Long said. In his first weeks as an intern, he was in the process of getting his PhD paper published and shared a draft with one of his colleagues, who quickly suggested invaluable changes. “He knew all these little things that no one at my university knew. And you have interactions like that all the time.”
Long compares his experience at Amazon with that of his father’s in oil and gas, where small improvements in efficiency could have tens or hundreds of millions of dollars of business impact. “It’s awesome that one person or a group of people can sit down, think hard, and have a disproportionate effect on both customers and the business. There are very few places where that can occur.”