With new architectures like vision transformers (ViTs) taking up day-to-day applications, there exists a clear demand for software and machine learning infrastructure to support easy and extensible neural network architecture research in the field of vision.
Researchers from Google Brain have introduced SCENIC, an open-source
JAX library with a focus on Transformer-based models for computer vision research. It has been successfully used to develop classification, segmentation, and detection models for images, videos, and other modalities, including multi-modal setups.
SCENIC tooklit aims to facilitate rapid experimentation, prototyping, and research of new vision architectures and models. It offers optimised implementations of state-of-the-art research models spanning a wide range of modalities.
This open-source library offers unified, all-in-one codebase for modeling needs and implementations like ViT, DETR, and MLP Mixer, ResNet and U-Net.
SCENIC is developed in JAX and uses Flax as the neural network library. JAX is simple-to-use library that allows automatic differentiation of native Python and NumPy functions. It can support multi-host and multi-device training on accelerators such as GPUs and TPUs, making it perfect for large-scale machine learning research.
SCENIC aims to make large-scale model prototyping faster. To keep the code simple to understand and extend, its design prefers forking and copy-pasting over adding complexity or increasing abstraction. Only when functionality proves to be widely useful across many models and tasks it may be upstreamed to Scenic’s shared libraries.
Scenic is designed to propose different levels of abstraction, to support hosting projects that only require changing hyper-parameters by defining config files, to those that need customisation on the input pipeline, model architecture, losses and metrics, and the training loop.
To make this happen, the code in Scenic is organised as either project-level code, which refers to customised code for specific projects or baselines or library-level code, which refers to common functionalities and general patterns that are adapted by the majority of projects. The project-level code lives in the projects directory.
The team hopes that SCENIC will help researchers to efficiently test and scale ideas for developing new and superior neural network designs.