AWS scientist wins ICLR outstanding paper award
Ability to balance parameter size and effectiveness could be “extremely useful” in reducing parameter size of deep-learning models.
An Amazon Web Services (AWS) senior applied scientist and collaborators learned last week that their research paper is one of eight to earn an Outstanding Paper Award for the forthcoming International Conference on Learning Representations (ICLR 2021), which is dedicated to the advancement of deep learning.
The award-winning paper, “Beyond Fully-Connected Layers With Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters”, is authored by Aston Zhang and six other researchers from Nanyang Technological University, ETH Zurich, and the University of Montreal.
Neural networks frequently include so-called fully connected layers, in which each node in one layer connects to all of the nodes in the next layer. The operations performed by fully connected layers are typically modeled as matrix multiplication. Recent work has shown that it’s possible to reduce the number of parameters necessary to represent a fully connected layer by using quaternions, four-dimensional generalizations of complex numbers.
A complex number is one that combines real numbers and the imaginary number i, the square root of -1. By extension, a quaternion combines real numbers and three imaginary numbers, i, j, and k.
Because they have four components, quaternions need only one-fourth as many parameters to represent the operations of a fully connected layer. Zhang and his collaborators’ paper explains how to extend this concept to even higher-dimensional hypercomplex numbers — with four imaginary components, or 20, or as many as you like — with even greater savings in parameter count.
In developing a mathematical representation flexible enough to capture operations involving arbitrary hypercomplex numbers, Zhang and his collaborators found that the same representation could also capture real-numbered operations, such as matrix multiplication. They had found a way to subsume arbitrary hypercomplex numbers and real numbers under a single description.
“The paper’s reviewers helped us improve the paper,” Zhang says. “They were the ones who suggested we see how we could empirically learn predefined multiplication rules in different spaces, such as on artificial datasets.
"There exist multiplication rules in those predefined quaternion-numbered or real-numbered systems. However, relying only on them may restrict the architectural flexibility of deep learning.
“By learning multiplication rules from data, the dimensionality of hypercomplex numbers can be flexibly specified or tuned by users based on their own applications, even when such numbers or rules do not exist mathematically."
Zhang’s collaborators on the paper include Yi Tay, Shuai Zhang, Alvin Chan, Anh Tuan Luu, Siu Cheung Hui, and Jie Fu. At Amazon, Zhang is currently working on completing the book “Dive into Deep Learning”, which Zhang is co-authoring with three other primary authors, Zachary Lipton, Mu Li, and Alex Smola.
Conference organizers noted that 860 papers were submitted for this year’s program, and that a subset of them were submitted to the conference’s Outstanding Paper Committee for review. The eight winning papers will be presented during two Outstanding Paper sessions on May 5 and 6. To attend the event, individuals can register here.