Skip to main content
ModelTerms

Foundations · beginner

Deep Learning

Deep learning is machine learning using neural networks with many layers ("deep" = many layers). It powers nearly every recent breakthrough in AI, including LLMs and image generators.

Explanation

Before deep learning, machine learning depended on humans hand-engineering features (e.g., "count the number of edges in this image"). Deep networks instead learn the right features automatically across many layers — early layers capturing simple patterns and deeper layers composing them into higher-level concepts.

The deep-learning revolution started in 2012 when AlexNet crushed the ImageNet image-classification benchmark. The 2017 "Attention Is All You Need" paper introduced the transformer, the architecture behind nearly every modern LLM.

Deep learning is data-hungry and compute-hungry: bigger models trained on more data with more compute keep getting better, an empirical pattern called the scaling laws.

Examples

  • Image recognition models like ResNet.
  • Speech recognition (Whisper).
  • Large language models like GPT-4 and Claude.

Frequently asked

What is Deep Learning?

Deep learning is machine learning using neural networks with many layers ("deep" = many layers). It powers nearly every recent breakthrough in AI, including LLMs and image generators.

What is an example of deep learning?

Image recognition models like ResNet.

How is Deep Learning related to Neural Network?

Deep Learning and Neural Network are both foundations concepts. A neural network is a stack of simple mathematical units ("neurons") that learn to transform inputs into outputs by adjusting numeric weights during training.

Is Deep Learning considered beginner?

Deep Learning is generally considered beginner-level material in the AI and LLM space.

Neural NetworkFoundations

A neural network is a stack of simple mathematical units ("neurons") that learn to transform inputs into outputs by adjusting numeric weights during training.

TransformerArchitecture

The transformer is the neural network architecture behind virtually every modern large language model. It uses self-attention to model relationships between all positions in a sequence in parallel.

Machine LearningFoundations

Machine learning is the branch of AI in which models learn patterns from data instead of being explicitly programmed. The training process adjusts model parameters to reduce error on examples.

PretrainingTraining

Pretraining is the initial training phase where an LLM learns to predict the next token on trillions of tokens of general text. It produces a base model that can be adapted later.

Foundation ModelFoundations

A foundation model is a single large model pretrained on broad data that can be adapted to many downstream tasks. LLMs are the most common type.

Side-by-side comparisons

Sources