LLM Architectures

LLM architectures can be broadly categorized into encoders, decoders, and encoder-decoder models. Each type serves different purposes in natural language processing.

Key Characteristics

Probabilistic Nature: LLMs assign probabilities to potential next words, choosing the one with the highest probability based on the context.
Scale: The size of an LLM, defined by the number of parameters, impacts its ability to understand and generate language.

Encoders

Encoders transform a sequence of words into a vector representation, capturing the semantic meaning of the text. These models are essential for tasks like text embedding and classification.

MiniLM: A lightweight model designed for efficiency.
Embed-light: Optimized for creating embeddings with reduced computational overhead.
BERT (Bidirectional Encoder Representations from Transformers): Captures context from both directions to improve understanding.
RoBERTa (Robustly Optimized BERT Pretraining Approach): Enhances BERT by optimizing training techniques.
DistillBERT: A smaller, faster version of BERT with comparable performance.
SBERT (Sentence-BERT): Tailored for generating sentence embeddings.

Decoders

Decoders generate text by predicting the next word based on the input sequence. They are integral to tasks such as text generation and completion.

GPT-4: A state-of-the-art model known for its advanced text generation capabilities.
Llama: A model focusing on efficiency and scalability.
BLOOM: Designed for diverse text generation tasks.
Falcon: A model optimized for generating coherent and contextually relevant text.

Encoder-Decoder Models

These models combine the functionalities of encoders and decoders, first encoding the input and then generating text based on the encoded information.

T5 (Text-To-Text Transfer Transformer): Converts various NLP tasks into a text-to-text format.
UL2 (Universal Language Model Fine-Tuning): A model designed for general language understanding and generation.
BART (Bidirectional and Auto-Regressive Transformers): Combines the benefits of BERT and GPT for enhanced text generation.

Historical Tasks and Model Types

Different models are suited for various tasks, which can be categorized into:

Task	Encoders	Decoders	Encoder-Decoder
Embedding text	Yes	No	No
Abstractive QA	No	Yes	Yes
Extractive QA	Yes	Maybe	Yes
Translation	No	Maybe	Yes
Creative writing	No	Yes	No
Abstractive Summarization	No	Yes	Yes
Extractive Summarization	Yes	Maybe	Yes
Chat	No	Yes	No
Forecasting	No	No	No
Code	No	Yes	Yes

This categorization helps in selecting the appropriate model for specific NLP tasks.

Introduction Hallucination