Glossary
Brief definitions for technical terms that recur across the site. Each one is meant to be friendly before it is exhaustive.
- Little world
- A bounded form that lets the mind see a larger order: a syllable, token, equation, diagram, parable, model, or controlled example.
- Microcosm
- A small world that helps disclose a larger one. It abstracts away complexity so pattern can be examined.
- Macrocosm
- The larger order that a microcosm helps us approach: language, creation, culture, reason, mathematics, relation, and the Logos.
- Token
- A small unit of text used by a language model. A token may be a word, part of a word, punctuation mark, or other text fragment.
- Probability distribution
- A mathematical assignment of likelihoods across possible outcomes. For LLMs, it often means assigning probabilities to possible next tokens.
- Embedding
- A learned numerical representation of a word, token, sentence, or concept.
- Vector
- A list of numbers that can represent a point or direction in space. In LLMs, vectors often represent learned linguistic features.
- Matrix
- A rectangular grid of numbers. Matrix multiplication is one of the core operations behind neural networks.
- Parameter
- A number inside the model that is adjusted during training.
- Loss function
- A mathematical way of measuring how wrong the model's prediction was.
- Gradient
- A mathematical direction showing how to change parameters to reduce error.
- Gradient descent
- An optimization method that improves a model by repeatedly moving in the direction that reduces loss.
- Attention
- A mechanism that lets a model weigh which tokens matter for interpreting other tokens.
- Transformer
- A neural network architecture built around attention mechanisms, and the foundation of many modern LLMs.
- Scaling law
- A mathematical relationship showing how model performance changes with model size, data size, and compute.
- Scale
- The movement from many small operations into larger powers. In LLMs, scale links tokens and probabilities to broad linguistic behavior.