Glossary Term

BERT (language model)

Design and Pretraining - BERT is an encoder-only transformer architecture. - BERT consists of three modules: embedding, a stack of encoders, and un-embedding. - The un-embedding module is necessary for pretraining but often unnecessary for downstream tasks. - BERT uses WordPiece to convert English words into integer codes. - BERT's vocabulary size is 30,000. - BERT was pre-trained on language modeling and next sentence prediction tasks. - Language modeling involved predicting selected tokens given their context. - Tokens were replaced with tokens or random word tokens during prediction. - Next sentence prediction involved determining if two spans appeared sequentially in the training corpus. - BERT learns latent representations of words and sentences in context during pre-training. Architecture details - BERT has two versions: BASE and LARGE. - The lowest layer of BERT is the embedding layer, which contains word_embeddings, position_embeddings, and token_type_embeddings. - word_embeddings converts input tokens into vectors. - position_embeddings performs absolute position embedding. - token_type_embeddings distinguishes tokens before and after . Performance - BERT achieved state-of-the-art performance on the GLUE task set, SQuAD, and SWAG. - GLUE is a general language understanding evaluation task set consisting of 9 tasks. - SQuAD is the Stanford Question Answering Dataset. - SWAG refers to Situations With Adversarial Generations. Analysis - The reasons for BERT's state-of-the-art performance are not well understood. - Current research focuses on analyzing BERT's output, internal vector representations, and attention weights. - BERT's bidirectional training contributes to its high performance. - BERT gains a deep understanding of context by considering words from both left and right sides. - BERT's encoder-only architecture limits its ability to generate text. Recognition - The research paper describing BERT won the Best Long Paper Award at the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).

Back to Glossary