Glossary

Brief definitions for technical terms that recur across the site. Each one is meant to be friendly before it is exhaustive.

Little world
A bounded form that lets the mind see a larger order: a syllable, token, equation, diagram, parable, model, or controlled example.
Microcosm
A small world that helps disclose a larger one. It abstracts away complexity so pattern can be examined.
Macrocosm
The larger order that a microcosm helps us approach: language, creation, culture, reason, mathematics, relation, and the Logos.
Token
A small unit of text used by a language model. A token may be a word, part of a word, punctuation mark, or other text fragment.
Probability distribution
A mathematical assignment of likelihoods across possible outcomes. For LLMs, it often means assigning probabilities to possible next tokens.
Embedding
A learned numerical representation of a word, token, sentence, or concept.
Vector
A list of numbers that can represent a point or direction in space. In LLMs, vectors often represent learned linguistic features.
Matrix
A rectangular grid of numbers. Matrix multiplication is one of the core operations behind neural networks.
Parameter
A number inside the model that is adjusted during training.
Loss function
A mathematical way of measuring how wrong the model's prediction was.
Gradient
A mathematical direction showing how to change parameters to reduce error.
Gradient descent
An optimization method that improves a model by repeatedly moving in the direction that reduces loss.
Attention
A mechanism that lets a model weigh which tokens matter for interpreting other tokens.
Transformer
A neural network architecture built around attention mechanisms, and the foundation of many modern LLMs.
Scaling law
A mathematical relationship showing how model performance changes with model size, data size, and compute.
Scale
The movement from many small operations into larger powers. In LLMs, scale links tokens and probabilities to broad linguistic behavior.