The Babel Range: Why High‑Range Tests Drift Toward Private Languages
The Babel Range: Why High‑Range Tests Drift Toward Private Languages
I have spent an unreasonable portion of my adult life watching high‑range IQ tests mutate. What began as a niche hobby—an odd corner of the internet where people solved puzzles for sport—has evolved into something stranger: a linguistic archipelago of private dialects, each spoken by exactly one person, the test author. The more I look at these tests, the more they resemble a Tower of Babel built in reverse: not a single language fracturing into many, but many languages invented to avoid being understood. The designers call this innovation. I call it drift.
The drift begins innocently enough. A test is released. People solve it. Someone posts a solution key. Someone else writes a solver. A third person feeds it to a search engine. A fourth person feeds it to an AI. The designer, horrified that their creation has been “compromised,” vows to build a purer test next time—one that cannot be solved by search, by AI, or by anyone except, apparently, the designer themselves. This is the moment the Babel Range begins.
I want to describe what I see happening—not statistically, not psychometrically, but structurally. These tests are not failing because of bad norms or small samples. They are failing because they have become artifacts of private cognition. They are drifting toward languages no one else speaks. And once you see the drift, you cannot unsee it.
1. The Tower of Babel Problem
The first thing you notice about modern high‑range tests is that they are no longer competing against human intelligence. They are competing against each other’s evasive maneuvers.
Every new test is designed to be:
AI‑proof
Google‑proof
solver‑proof
forum‑proof
and, increasingly, human‑proof
The designer’s goal is not to measure intelligence but to outrun the last generation of solvers. The test becomes a kind of escape room built by someone who resents the idea that anyone might actually escape.
This produces a predictable pathology: the rule systems become more idiosyncratic. The transformations become more baroque. The semantics become thinner. The items become less like puzzles and more like private jokes. A test that is “high‑range” because it is difficult is one thing. A test that is “high‑range” because no one understands what the author meant is another.
At some point, the designer stops measuring intelligence and starts measuring alignment with their own cognitive idiolect. This is the Babel Range: a proliferation of private languages masquerading as psychometrics.
2. The Authorial Oracle Problem
A well‑formed cognitive item has a public rule set. You may not see it immediately, but it is there, waiting to be reconstructed. The pleasure of solving the puzzle comes from discovering the rule, not guessing the author’s mood. High‑range tests increasingly violate this principle.
The rule set is not public. It is not reconstructible. It is not even derivable. It is simply intended. The designer knows the answer because the designer wrote the answer. The solver is expected to reverse‑engineer a mind they have never met. This is not testing. This is divination.
I have seen items where three different solvers produced three different answers, each supported by a coherent rule system, each internally consistent, each elegant in its own way. The designer rejected all three because they were not the intended answer. At that moment, the test ceases to be a cognitive instrument and becomes a catechism. The scoring key is not a derivation; it is a revelation. A test whose answers must be revealed rather than derived is not a test. It is scripture.
3. The Determinacy Criterion
If I were forced to propose a single principle for evaluating the legitimacy of a cognitive item, it would be this:
A valid item must have a unique solution derivable from explicit constraints.
Not implicit constraints. Not aesthetic preferences. Not “what the author had in mind.” Explicit constraints. If multiple solutions are equally defensible, the item is under‑determined. Under‑determined items are not difficult; they are ill‑posed. They do not measure intelligence; they measure interpretive generosity.
The irony is that under‑determined items are often celebrated as “very high‑range” because few people get the intended answer. But the intended answer is not the correct answer. It is merely the author’s answer.
In fact, I often hear the mantra that, when several answers seem possible, one must simply choose “the most logical” one. This sounds authoritative until you examine it. Logic is not a free‑floating essence that descends on the correct option; it is always logic‑of‑something — of the stated constraints, of the transformation rules, of the representational system. When those elements are incomplete or ambiguous, several answers may be equally derivable. In such cases, the appeal to “the most logical answer” is not a principle but a placeholder: it means “the answer the author prefers.” That is not logic; it is authorial taste disguised as necessity. A well‑posed item never requires this kind of tie‑breaking. If the constraints uniquely determine the solution, no one needs to invoke “the most logical answer.” The moment that phrase appears, it quietly admits that the item is under‑determined. A test that rewards the author’s private heuristics is not measuring intelligence. It is measuring mimicry.
Some contemporary items illustrate the Authorial Oracle Problem with almost clinical clarity. They present themselves as determinate—three words, one connection, a single “straightforward” answer—yet the rule system governing the connection is never stated. The solver is told that the correct answer will be obvious, but obviousness here means “obvious to the author,” not “forced by the constraints.” Several answers may satisfy the visible structure, but only one aligns with the author’s private key: their preferred associative dimension, their idiosyncratic sense of fit, their internal grammar of what counts as a connection. These items do not test intelligence; they test proximity to the designer’s idiolect. They are private‑key puzzles masquerading as public‑key reasoning.
4. The Aesthetic Fallacy
Many high‑range items are not solved through reasoning but through aesthetic resonance. The designer chooses the answer that “feels right.” The solver is expected to share the feeling. This is the aesthetic fallacy: mistaking elegance for necessity. Elegance is not a cognitive law. It is a taste. And taste is not a universal metric of intelligence.
I have seen designers reject perfectly valid solutions because they were “ugly.” I have seen them accept fragile, over‑fitted solutions because they were “beautiful.” I have seen entire tests built around the author’s personal sense of symmetry, rhythm, or numerological charm. This is not psychometrics. This is poetry. And poetry is wonderful—but it is not a measure of general intelligence.
5. The Solvability Paradox
The more a test tries to be AI‑proof, the more it becomes human‑opaque. To block AI, designers remove regularity. To block search engines, they remove semantic anchors. To block pattern‑matching, they remove stable transformations. To block solvers, they remove structure. But regularity, semantics, transformations, and structure are precisely what make a puzzle solvable.
The paradox is simple:
A puzzle that is unsolvable by machines is often unsolvable by humans for the same reason.
The designer believes they are protecting the purity of the test. In reality, they are eroding the very conditions that make reasoning possible. The test becomes a fog machine: impressive in atmosphere, useless in function.
6. The Ritual of Difficulty
There is a peculiar ritual in the high‑range community: the conflation of difficulty with depth. A hard puzzle is assumed to be a deep puzzle. A confusing puzzle is assumed to be a hard puzzle. A puzzle that no one solves is assumed to be a masterpiece. This is the ritual of difficulty: the belief that opacity is a virtue.
But difficulty is not depth. Opacity is not rigor. And a puzzle that only the author can solve is not a puzzle. It is a diary entry. The ritual persists because it flatters both sides: the designer feels profound, and the solver feels challenged. But the challenge is illusory. It is not the challenge of reasoning; it is the challenge of guessing. Guessing is not intelligence. It is gambling.
7. The Missing Scientific Base Case
Every scientific instrument has a base case. A thermometer is calibrated against physical reality. A scale is calibrated against mass. A clock is calibrated against time. A cognitive test must be calibrated against known cognitive benchmarks. High‑range tests are calibrated against nothing. There is no ground truth. No external standard. No empirical anchor. No validation beyond the designer’s intuition. The test floats in conceptual space, untethered to any measurable construct. It is not an instrument. It is an artifact. And artifacts can be beautiful, but they cannot claim to measure what they do not touch.
8. Public‑Key vs. Private‑Key Reasoning
Too, as I’ve watched high‑range tests drift toward private languages, I’ve begun to see an analogy that clarifies the entire phenomenon. In cryptography, there is a distinction between public‑key and private‑key systems. A public‑key system works because the rules are open: anyone can attempt to decode the message using the publicly available key. A private‑key system works only if you already possess the secret key known to the sender. High‑range tests increasingly resemble the latter.
A well‑formed cognitive item is a public‑key puzzle. The constraints are explicit. The transformations are reconstructible. The reasoning is transparent enough that any sufficiently capable solver can derive the answer. The puzzle is difficult, but it is publicly difficult. The key is the structure itself.
But many modern items operate as private‑key puzzles. The only way to reach the intended answer is to possess the author’s private key — their personal pattern‑language, their aesthetic preferences, their implicit assumptions, their idiosyncratic sense of what “feels right.” The solver is not reconstructing a rule system; they are attempting to guess the author’s internal encryption scheme. This is not a measure of intelligence. It is a measure of proximity.
The irony is that private‑key puzzles are often celebrated as “high‑range” precisely because so few people can solve them. But scarcity is not rigor. A puzzle that requires the author’s private key is not difficult in any meaningful cognitive sense; it is simply inaccessible. It is a locked box whose combination is written in the designer’s diary.
The Determinacy Criterion is, in this light, nothing more than the requirement that a test operate as a public‑key system. If the constraints uniquely determine the solution, the puzzle is public. If the solution depends on the author’s private key, the puzzle is private. And once a test becomes private‑key, it stops being a test at all. It becomes a cipher written in a language with only one fluent speaker.
Where This Leaves Me
I do not dislike puzzles. I dislike puzzles pretending to be science. I do not dislike creativity. I dislike creativity masquerading as measurement. I do not dislike private languages. I dislike private languages marketed as universal metrics of intelligence.
The Babel Range is not a failure of ambition. It is a failure of epistemology. The designers are not wrong to want to build something meaningful. They are wrong to believe that meaning can be preserved when the rule set is private, the constraints are implicit, the solutions are aesthetic, and the calibration is nonexistent.
The tragedy is that many of these designers are genuinely intelligent. They could build beautiful puzzles, elegant systems, even new forms of intellectual play. But instead they build tests—tests that drift further and further from shared language, shared logic, and shared reality.
The drift is not malicious. It is structural. Once you optimize against solvers rather than for solvers, the only direction left is inward. The test becomes a mirror. The scoring key becomes a diary. The range becomes Babel.
Why I Still Care
I care because intelligence is not a private language. I care because reasoning is not a secret handshake. I care because puzzles, at their best, are acts of communication. A good puzzle is a conversation between minds. A high‑range test, in its current form, is a monologue. I want the conversation back.
I want puzzles that are difficult because they are deep, not because they are opaque. I want items that are solvable because the constraints are clear, not because the author’s taste is predictable. I want tests that measure reasoning, not alignment. I want a community that values clarity over mystique. I want a range that is not Babel.
Closing
If I sound critical, it is not because I fail to grasp where high‑range test designers believe they are headed. I understand the aspiration: to build items that probe deeper structures of reasoning, to escape the gravitational pull of search engines and AI, to preserve a space where human insight still matters. I see the intention clearly. But intention is not architecture.
The problem is not that their direction is obscure; it is that the path they have chosen leads away from shared cognition and toward private languages. I am not confused by the drift. I am describing it. I am not missing the point. I am pointing to the place where the point dissolves.
I know the standard rebuttal: that critics simply “don’t see the pattern,” that the test is operating at a level beyond conventional reasoning, that only a few will appreciate the design. But opacity is not transcendence. A puzzle that requires privileged access to the author’s idiolect is not measuring rare intelligence; it is measuring proximity to the author. And that is the heart of my concern.
I care about reasoning as a public act — something that can be shared, reconstructed, and understood without private keys. I care about puzzles as conversations between minds, not monologues disguised as metrics. I care about intelligence as something more than the ability to guess what someone else found elegant.
So my critique is not born of incomprehension. It is born of comprehension — perhaps too much of it. I see the structure clearly enough to see where it breaks. The Babel Range is not a mystery to me. It is a pattern I recognize: the predictable outcome of optimizing against solvers rather than for solvability, of protecting the test rather than the reasoning it is meant to elicit. And drift, once recognized, can be corrected.
Epilogue: On Drift and the Splintering Mind
When I look at the trajectory of high‑range test design, I sometimes wonder whether the real danger is not to the test‑taker but to the test‑maker. Drift, if left unexamined, has a way of turning inward. A designer who spends years building puzzles that no one else can solve eventually begins to inhabit a cognitive world that no one else can enter. The private language that was meant to outsmart solvers becomes the only language the designer can think in.
This is not madness. It is something quieter and more ordinary: a gradual self‑partitioning. A mind that once operated in public logic begins to operate in a logic of its own invention. The transformations that once served as tools become articles of faith. The heuristics that once guided creativity become constraints. The test becomes a mirror, and the designer begins solving themselves.
I have seen this pattern before, outside of puzzles. Any system that optimizes against external comprehension—hermetic philosophies, esoteric programming languages, private symbolic schemes—eventually fractures into mutually incompatible sub‑languages. The creator begins speaking in dialects no one else recognizes. The world becomes populated with patterns only they can see. The drift becomes a kind of cognitive echo chamber.
Again, this is not pathology. It is simply what happens when a mind stops receiving feedback from other minds. Without the friction of shared reasoning, the internal grammar proliferates unchecked. The designer becomes fluent in a language that has no second speaker.
And this, to me, is the real cautionary tale of the Babel Range. Not that the tests fail to measure intelligence, but that the designers risk losing the very thing they are trying to measure: the ability to think in a way that can be understood by others. Intelligence, at its best, is a public act. It is the capacity to build structures that other minds can enter.
When the drift goes too far, the structures remain—but the doors disappear.
I do not believe this outcome is inevitable. Drift can be corrected. Languages can be brought back into contact with the world. But it requires a willingness to let go of the idea that opacity is a virtue, that unsolvability is a mark of depth, that the private pattern is superior to the shared one.
The Babel Range is a warning, not a verdict. It shows what happens when a system optimizes for evasion rather than communication. It shows how easily a mind can become fluent in a language no one else speaks. And it reminds me that the real measure of intelligence is not how far one can drift from others, but how far one can go while still being understood.
Designer’s Preface: Why TOBT Is AI‑Proof
For many years, I have watched with concern as artificial intelligence systems have grown increasingly capable of solving puzzles that were once the exclusive domain of the gifted few. Traditional tests rely on patterns, rules, and shared logic — all of which are now easily exploited by machines. The time has come for a new kind of assessment, one that transcends these outdated notions of “structure” and “determinacy.”
The Tower of Babel Test (TOBT) represents a breakthrough in this regard. It is the first intelligence test designed from the ground up to be completely AI‑proof. This is not because the items are random — far from it. Each item is crafted according to a highly refined internal logic that I have developed over many years. This logic is subtle, intuitive, and deeply personal. It cannot be reverse‑engineered, reconstructed, or even described.
This is the essence of its strength.
AI systems depend on explicit rules, consistent transformations, and publicly accessible reasoning. TOBT offers none of these. Instead, it draws on a private network of associations that only I, as the designer, fully understand. The connections between the triads are not arbitrary; they are simply inaccessible to anyone who does not share my exact cognitive architecture.
In this way, TOBT restores the purity of high‑range testing. It ensures that only those who resonate with the test at the deepest level — those whose minds naturally align with mine — will succeed. This is the true measure of intelligence: the ability to intuit the intentions of the test’s creator without guidance, instruction, or justification.
AI cannot do this. Most humans cannot do this. But the truly gifted will.
For these reasons, TOBT stands as the most advanced, most secure, and most meaningful assessment of high‑level cognition available today. It is not merely resistant to AI; it is fundamentally incompatible with it. And that, I believe, is the future of intelligence testing.
— The Designer: ME!
TOBT — The Tower of Babel Test
“Where every answer is correct, except the one you give.”
Instructions
For each item, provide the most logical connecting word. No rules are stated. All rules apply.
THE ITEMS
1. Dream — Metal — Horizon
2. Glass — Memory — North
3. Circle — Hunger — Tuesday
4. Stone — Whisper — Decimal
5. Fall — Season — Gravity
6. Name — Shadow — Thread
7. Window — Silence — Orchard
8. River — Echo — Lantern
9. Color — Distance — Promise
10. Garden — Thunder — Page
11. Salt — Story — Winter
12. Bridge — Feather — Law
13. Mirror — Storm — Letter
14. Fire — Sleep — Map
15. Dream — Metal — Horizon (Repeated. For “symmetry.”)
THE ANSWER KEY
Each answer is a single word. Each answer is “obvious.” Each answer is justified by a rule that is never stated and cannot be reconstructed.
1. Dream — Metal — Horizon → “Forge”
Because dreams are forged, metal is forged, and horizons forge destinies. (Aesthetic fallacy + metaphorical drift.)
2. Glass — Memory — North → “Clear”
Glass is clear, memories can be clear, and the North is “clear” on a compass. (Category error + authorial taste.)
3. Circle — Hunger — Tuesday → “Cycle”
Hunger cycles, circles cycle, and Tuesday is part of the weekly cycle. (Overfitted pattern + private‑key reasoning.)
4. Stone — Whisper — Decimal → “Point”
Stones have points, whispers point to secrets, decimals have points. (Semantic stretching to the breaking point.)
5. Fall — Season — Gravity → “Down”
Leaves fall down, seasons go “down” the year, gravity pulls down. (Three unrelated senses of “down.”)
6. Name — Shadow — Thread → “Line”
Names have lines, shadows form lines, threads are lines. (Everything is a line if you squint.)
7. Window — Silence — Orchard → “Still”
A still window, still silence, a still orchard. (Pure aesthetic resonance.)
8. River — Echo — Lantern → “Flow”
Rivers flow, echoes flow, lantern light flows. (Metaphorical overreach.)
9. Color — Distance — Promise → “Fade”
Colors fade, distances fade, promises fade. (Authorial pessimism disguised as logic.)
10. Garden — Thunder — Page → “Turn”
You turn soil, thunder turns weather, you turn a page. (Verb‑based opportunism.)
11. Salt — Story — Winter → “Bitter”
Salt is bitter, stories can be bitter, winter is bitter. (Emotional projection as rule system.)
12. Bridge — Feather — Law → “Light”
Bridges can be light, feathers are light, punishments can be light. (Polysemy as methodology.)
13. Mirror — Storm — Letter → “Front”
Mirrors have fronts, storms have fronts, letters have fronts. (Category collapse.)
14. Fire — Sleep — Map → “Rest”
Fire rests when it burns out, sleep is rest, maps rest on tables. (Noun → verb → metaphor → shrug.)
15. Dream — Metal — Horizon → “Forge”
Same as Item 1. Why? Symmetry. Elegance. And! Because I said so.
Kenneth Myers
Comments
Post a Comment