New trillion-parameter AI language model

A team of researchers from Google revealed their new trillion-parameter AI language model.

 

This model surpasses the latest OpenAI’s GPT-3, one of the largest language models ever trained, which used around 175 billion parameters.

 

The Google Brain team managed to develop techniques that allowed them to train and create the largest language model to exist. The researchers declared that large-scale training is an efficient way to develop these kinds of models. Indeed, having simple architecture, backed by large datasets and parameter counts can surpass a more complicated algorithm. Their Switch Transformer is a technique that uses only a subset of a model’s weights or the parameters that transform input data within the model.

 

Moreover, the Switch Transformer is even more efficient as it leverages hardware designed for dense matrix multiplications. The researchers then are able to split wights on different devices so the weights can increase with the number of devices all the while maintaining a manageable memory and computational footprint on each device.

 

Google declared that they tried to develop an architecture that is easy to understand, stable to train, and greatly efficient than most dense models. Their new model then excels across a diverse set of natural language tasks and in different training regimes, such as multi-task training. With these advances, they were able to train models with hundreds of billion to trillion parameters.

 

In the future, Google researchers are planning to apply the Switch Transformer to new and across different modalities, such as image and text.

More
articles