Kernel Neural Architecture Search (KNAS), an efficient automatic machine learning (AutoML) system that can evaluate suggested architectures without training, has been open sourced by Alibaba Group and Peking University researchers. KNAS uses a gradient kernel as a proxy for model quality and consumes an order of magnitude less computing resources than standard techniques.
Unlike many AutoML algorithms, KNAS can predict which model architectures will perform well without actually training them, potentially saving several hours of computation work. When compared to previous AutoML approaches, KNAS achieved a 25x speedup while delivering results with “comparable” accuracies on the NAS-Bench-201 computer vision benchmark. When compared to a pre-trained RoBERTA baseline model, KNAS created models that performed better on text classification tests.
Neural architecture search (NAS) is an AutoML branch that aims to find the best deep-learning model architecture for a task; more specifically, given a task dataset and a search space of possible architectures, NAS aims to find an architecture that will achieve the best performance metric on the task dataset. However, this usually necessitates completely training each proposed model on the dataset. According to some data scientists, the computing power necessary for such a search “emits as much carbon as five cars over their lifetime.” As a result, several researchers are looking into ways to improve search algorithms so that fewer models need to be trained.
Instead, the Alibaba researchers decided to see if architectures could be rated without any prior training. “Gradients can be utilised as a coarse-grained proxy of downstream training to evaluate randomly-initialized architectures,” the researchers hypothesised. The researchers discovered a gradient kernel, in this case the mean of the Gram matrix (MGM) of gradients, that has a substantial link with the accuracy of a model. The KNAS algorithm then calculates the MGM for each suggested model architecture, selects the best few, calculates model accuracy for those candidates, and chooses the model with the highest accuracy as the final result.
On NAS-Bench-201, the researchers evaluated KNAS’ performance against that of various other NAS algorithms, including random search, reinforcement learning (RL) search, evolutionary search, hyper-parameter search, differentiable algorithms, and two additional “training-free” methods. Except for the two training-free algorithms, the team discovered that KNAS was faster than all other approaches. It also provided the most accurate model for one benchmark dataset and competitive results for the other two, exceeding the human-crafted ResNet baseline model in one case.
NAS methods that require little to no model training have also been researched by a number of other research institutions. Samsung just released Zero-Cost-NAS, a model evaluation tool that leverages a single mini-batch of data. A team from the University of Edinburgh published Neural Architecture Search Without Training, which looks at the “overlap of activations between data points in untrained networks” to evaluate models, and researchers from the University of Texas released Training-freE Neural Architecture Search (TE-NAS), which uses two training-free proxies for scoring models. InfoQ published late last year on a Facebook algorithm that leverages NAS to establish deep-learning model parameters without training.