History
Coleman Dimensional Encoding is the result of asking myself one question:
Why are computers still so slow in 2026?
I've been programming since I was six, starting with BASIC and 6502 assembly. Decades later I'm playing XCOM 2, and the AI takes longer to decide its turn than it takes ZORK to load on my Commodore 64. What was the loading screen for if every turn computes everything from scratch? Cover, elevation, sight lines, and threat levels are all known, all structured, and all recomputed for every unit, for every turn. My rig is generating waste heat from the same questions it has asked over and over before, funneling electrons into the same answers out of an assumption: this is the way it is, and computers are just slow at these sorts of problems. But the machine isn't slow; the work is redundant. The structure was never designed to answer the questions being asked of it, and not only can we fix that, we can codify that structure using dimensional thinking.
The idea that dimensions hide things in plain sight has been with me as long as I can remember. I grew up watching Quantum Leap as a little kid every evening, Sam Beckett leaping through time trying to make right what once went wrong (cue the theme music). The premise was that time is not a wall but a corridor, something you can actually move through and aren't bound by. It's a dimension, and dimensions are things you can navigate.
Then I read Flatland just a few years later, a story about a square living in two dimensions who is unable to see the sphere passing through his world. Not because the sphere isn't there, but because the square doesn't have the axis to perceive it. The information exists, but the dimension to see it does not.
Then I read about Kaluza-Klein theory, and the intuition became physics. In 1921 Theodor Kaluza took Einstein's field equations for general relativity, which describe gravity as curvature in four-dimensional spacetime, and extended them to five dimensions. He didn't add new forces or new particles. He just added one more axis to the geometry.
When he worked out the math, Maxwell's equations for electromagnetism appeared automatically as components of the five-dimensional metric. Five years later, Oskar Klein showed how this fifth dimension could be compactified, curled into a circle so small it's unobservable, while its effects remain everywhere. Electromagnetism wasn't a separate force bolted onto gravity. It was curvature in a dimension that had always been there, one that the four-dimensional model simply couldn't represent.
Framework
The pattern is always the same: information that looks missing from one perspective is already there, encoded in a dimension that the perspective doesn't include. What if data systems worked the same way? What if you could add the right dimensions to a dataset so that every query followed a geodesic, the shortest possible path through the structure to the answer?
WASPWorkload-Aware Sufficient Placement
Defines the problem. Every true answer is found, every false positive is filtered, work scales with the answer rather than the dataset, and no dimension is wasted.
CDEColeman Dimensional Encoding
Solves it. Analyze the workload, encode each record as a coordinate, build the index, translate queries into bounded probes.
MSSMinimally Sufficient Statistics
Keeps it honest. Every claim is classified as a definition, guarantee, assumption, or unknown. Nothing is stated without knowing which category it belongs to.
Formal Definitions
WASP
Let $D$ be a finite set of records and $Q$ a set of queries, where each query is a predicate $q: D \to \{0, 1\}$. Let the workload be $W \subseteq Q$. A WASP instance consists of five components:
- $k \in \mathbb{N}$: number of encoding dimensions
- $E: D \to \mathbb{Z}^k$: maps each record to a coordinate
- $I: \mathbb{Z}^k \to \mathcal{P}(D)$: maps each coordinate to the records stored there
- $T: Q \to \mathcal{P}(\mathbb{Z}^k)$: maps each query to coordinates to probe
- $F: Q \times D \to \{0, 1\}$: per-query filter that removes false positives
Index well-formedness: $\forall r \in D: r \in I(E(r))$.
Five building blocks. $k$ sets how many dimensions the space has. $E$ places each record at a coordinate. $I$ retrieves records from a coordinate. $T$ converts a query into coordinates to check. $F_q$ removes false positives from the results. The well-formedness constraint says that every record is retrievable from the coordinate it was assigned to.
These define three quantities for any query $q$:
$$\mathrm{Ans}(q) = \{r \in D : q(r) = 1\}$$ $$\mathrm{Cand}(q) = \bigcup_{c \in T(q)} I(c)$$ $$\mathrm{Work}(q) = |T(q)| + |\mathrm{Cand}(q)|$$$\mathrm{Ans}(q)$ is the perfect answer, every record that truly matches. $\mathrm{Cand}(q)$ is what the index returns; it may include extras. $\mathrm{Work}(q)$ is the total cost, coordinates probed plus records examined. Evaluating $F_q(r)$ is assumed O(1) per candidate.
A valid solution satisfies four properties:
Sufficiency
$$\mathrm{Ans}(q) \subseteq \mathrm{Cand}(q)$$Equivalently: $\forall q \in W, \forall r \in D: q(r) = 1 \Rightarrow E(r) \in T(q)$.
Exactness (axiom)
$$\forall q \in W, \forall r \in \mathrm{Cand}(q): F_q(r) = q(r)$$Theorem (from Sufficiency + Exactness): $\mathrm{Ans}(q) = \{r \in \mathrm{Cand}(q) : F_q(r) = 1\}$.
Bounded work
$$\exists\,\gamma, \beta \ge 0 \text{ such that } \forall q \in W: \mathrm{Work}(q) \le \gamma\,|\mathrm{Ans}(q)| + \beta$$Minimality: removing dimension $i$ means projecting $\pi_i: \mathbb{Z}^k \to \mathbb{Z}^{k-1}$ and deriving $E', I', T'$ on $\mathbb{Z}^{k-1}$. Minimality holds when $\forall i \in \{1, \ldots, k\}$, the projected scheme violates at least one of Sufficiency, Exactness, or Bounded work on $W$.
You never miss a correct answer. The filter agrees with the query on every candidate. Effort grows with the answer size, not the dataset size. Every dimension earns its keep; remove one, project the coordinate space down one axis, rederive the scheme, and a guarantee breaks.
CDE
Given a workload $W$, construct the five WASP components in four phases:
- Workload analysis: derive candidate dimensions from $W$
- Coordinate encoding: define discretizers per dimension, compute $E(r)$
- Index construction: build $I(c)$ from observed coordinates
- Query translation: implement $T(q)$ and $F_q$ so the four properties hold
Study the questions to discover natural axes. Assign each record a position on those axes. Build the lookup from positions to records. Convert incoming queries into bounded coordinate scans.
MSS
Given a statement set $S$, define a labeling function $L: S \to \{\text{Def}, \text{Gua}, \text{Asm}, \text{Unk}\}$ such that:
- Partition: every $s \in S$ gets exactly one label
- Traceability: every Guarantee is derivable from Definitions and Assumptions (not from Unknowns)
- Independence: no Assumption is derivable from the other Assumptions and Definitions
- No laundering: no Unknown is used as if it were a Guarantee
A description is minimally sufficient when $L$ satisfies all four criteria.
Every sentence in the system has a label. Definitions are choices we made. Guarantees follow from those choices. Assumptions are bets; if one is wrong, every guarantee that depends on it breaks. Unknowns are honest gaps. The description is minimally sufficient when nothing is mislabeled and nothing is redundant.
Repositories
Working implementations. Each one applies the same dimensional thinking to a different constraint.
Coda
The symbol at the top of this page is the nabla, ∇. It is the gradient operator. Applied to a scalar field, it returns a vector pointing in the direction of steepest ascent. Applied to a dataset with the right dimensions, it does the same thing; the shortest path to the answer reveals itself.
The shape of the symbol is not incidental. A T-shaped person has one deep vertical specialty and a horizontal bar of broad but shallow knowledge. The nabla is the next step. It is the shape of a polyglot, a converging geometry where many domains, languages, and perspectives flow inward and focus into a single, decisive direction of movement.
This is the Coleman Dimensional Encoding framework.
Not faster hardware. Not cleverer algorithms acting on the same flat structures. Just the precise dimensions, derived from the questions themselves, encoded directly into the geometry of the data.
When the dimensions are right, the gradient does not need to be forced. The steepest path simply reveals itself. This happens not because the answer was hidden, but because the shape of the question finally possessed the capacity to find it.