The Voynich Paradox – When Evidence Points Both Ways

Evidence suggests the Voynich Manuscript is a meaningless hoax. Yet statistical analysis proves its text behaves like a real language. This investigation isn't about deciphering the book, but exploring the paradox at its heart.

Filed Under: Paradox Files
Primary Topic: Ancient Manuscripts
Connections: Logic Failure, Official Record Anomaly, Symbolic Systems
Investigation: The Manuscript That Broke Itself
Date Published: July 28, 2025
Last Updated: October 8, 2025

A folio from the Voynich Manuscript showing a botanical illustration of an impossible plant with five red star-shaped flowers and one blue one, next to a block of unreadable, handwritten script.

In the 1940s, the codebreakers who cracked Japanese naval codes during World War Two tried to decipher the Voynich Manuscript. They failed. In 2013, computational linguists proved the text follows Zipf’s Law, a mathematical rule that governs all human languages. A text that defeats master cryptographers shouldn’t behave like a language. A text that behaves like a language shouldn’t be unbreakable.

The Unreadable Book

The Voynich Manuscript sits in Yale’s Beinecke Library, catalogued as MS 408. It’s 240 pages of vellum covered in an unknown script that nobody has verifiably deciphered. Not a single word.

The book’s known history starts in Prague. Georg Baresch, an alchemist, owned it around 1639. He sent letters to Jesuit scholar Athanasius Kircher, begging for help with translation. After Baresch died, his friend Johannes Marcus Marci inherited the manuscript and sent it to Kircher in 1666. Marci’s letter claimed that Emperor Rudolf II had purchased it for 600 gold ducats, believing it to be the work of Roger Bacon. That attribution has never been proven.

The manuscript vanished into Jesuit archives until 1912, when antiquarian book dealer Wilfrid Voynich bought it from the Collegio Romano in Frascati. He gave the book its modern name and launched it into public scrutiny.

Here’s the hard evidence.

In 2009, the University of Arizona radiocarbon dated the vellum to between 1404 and 1438. It acts as a brutal filter, instantly invalidating any theory that requires a person or a technology that did not exist in the early 15th century. This single fact invalidates dozens of theories. Edward Kelley couldn’t have forged it; he wasn’t born until 1555. John Dee couldn’t have commissioned it. Any technology or concept invented after 1438 becomes irrelevant to its creation.

The manuscript is divided into sections based on its illustrations. The herbal section shows plants. The astronomical section depicts zodiac symbols and celestial diagrams. The biological section contains what appear to be human figures in pools connected by tubes. The pharmaceutical section shows containers and roots. The text fills every available space around these drawings in a script scholars call ‘Voynichese’.

That script consists of roughly 20 to 30 distinct characters, depending on how you count variants. The text runs left to right. Words are separated clearly. Paragraphs begin with decorated letters. It looks exactly like a medieval European manuscript should look, except nobody can read it.

The central problem is simple enough.

Over a century of analysis by top cryptanalysts, linguists, and now artificial intelligence has produced no verified translation. William Friedman, who led the team that broke Japan’s PURPLE cipher, spent years on it after the war. His conclusion? Either it was written in a synthetic language or it required a codebook that no longer exists.

A two-page spread from the herbal section of the Voynich Manuscript, displaying numerous detailed but unidentifiable plant illustrations, each accompanied by paragraphs of the unknown script. — A typical two-page spread from the manuscript's 'herbal' section. While appearing scientific, experts have identified most of the plants as composite inventions, stylistically similar to the symbolic 'alchemical herbals' of the 15th century rather than genuine botanical records.

The Case for a Hoax

The fantastical illustrations provide the first clue that something’s off.

Botanical experts have tried to identify the plants in the herbal section. They can’t. Most appear to be inventions or composites. Leaves from one species attach to roots from another. Some roots look more like sea creatures than plant parts. Stems grow back into themselves in ways that defy nature.

Sergio Toresella, a historian specialising in medieval herbals, placed these illustrations in context. They match a specific genre popular in Northern Italy between the late 1300s and mid-1500s, alchemical herbals. These manuscripts deliberately featured impossible plants for symbolic or esoteric purposes. They were never meant as botanical guides. They were artistic creations mixing real and imaginary elements.

In 2004, computer scientist Gordon Rugg proposed a mechanical method for creating the text. His hypothesis centred on the Cardan grille, a tool invented by Girolamo Cardano in 1550 for hiding messages.

The process works like this.

First, create a large table divided into columns. Fill each column with syllables that can only appear in certain positions. Put prefixes in the first column, word stems in the middle columns, and suffixes in the final column. This mirrors how Voynichese words appear to follow strict rules about which characters can appear where.

Next, cut holes in a piece of cardboard to create a grille. When you place this grille over your table, it reveals one syllable from each column. Write these syllables together to form a word. Slide the grille to a new position and repeat. The method automatically creates words with consistent structure because each syllable stays locked to its grammatical position.

Rugg demonstrated this could generate text with Voynichese-like properties. He estimated one person could produce the entire manuscript in three months using this method. The text would look complex but carry no meaning whatsoever.

The theory has a glaring chronological problem.

Cardano invented his grille over a century after the manuscript’s vellum was created. Rugg argues someone must have invented a similar table-based method earlier, but there’s no evidence for this. More importantly, his simple version of the method fails a critical test. Whilst it produces the right kind of word structures, it can’t replicate the manuscript’s more complex statistical patterns.

The manuscript itself shows signs consistent with rapid, mechanical production. The entire 240 pages contain almost no corrections or erasures. Medieval scribes copying genuine texts made mistakes. They crossed out words, scraped away errors, and inserted corrections. This manuscript flows as if someone wrote it in one smooth pass. Exactly what you’d expect from someone using a mechanical generation method, reading syllables through holes in a card.

Rugg's Table-and-Grille Method

A mechanical process for generating structured but meaningless text.

1. The Syllable Table

Prefix

Infix

Suffix

cho

dai

sho

A table contains all possible word-parts, organised by position.

2. The Cardan Grille

A mask with holes is placed over the table.

dai

The grille reveals one syllable from each column.

3. The Result

daitely

The revealed syllables are combined to form a new, structured "word". The grille is then moved to create the next word.

The Case for a Language

The strongest evidence against the hoax theory comes from computational analysis. Multiple studies confirm the Voynich text follows Zipf’s Law. This mathematical principle, discovered by linguist George Zipf, describes how word frequencies work in every human language. In any meaningful book, the most common word appears roughly twice as often as the second most common word, three times as often as the third, and so on.

Random text doesn’t follow Zipf’s Law. Neither does text generated by simple mechanical methods. As Marcelo Montemurro pointed out in his 2013 study, creating a Zipfian distribution requires complex generative forces. The odds of accidentally producing it are virtually zero.

The manuscript’s information entropy tells the same story.

Entropy measures how predictable or random a text is. Random gibberish has high entropy because any character could follow any other. Real languages have lower entropy because grammar and meaning constrain what can come next. The Voynich text has low entropy, just like genuine languages.

But Montemurro’s team found something even more significant. They discovered the manuscript exhibits ‘thematic clustering’ or ‘burstiness’. In plain English, related words appear together.

Think about any technical book. In a geology textbook, words like ‘sedimentary’ and ‘igneous’ cluster in chapters about rock types. They’re rare or absent in chapters about fossils. This happens because meaning drives word choice. When you write about rocks, you use rock words.

Montemurro’s analysis found exactly this pattern in the Voynich Manuscript.

Words that carry the most information tend to burst, appearing frequently in specific sections, then vanishing. More remarkably, these clusters map directly onto the manuscript’s illustrated themes. ‘Herbal’ words concentrate in the herbal section. ‘Astronomical’ words cluster around the zodiac diagrams.

The team used these patterns to construct semantic networks, showing which words tend to appear together. These networks revealed the manuscript’s thematic structure using nothing but statistical analysis of the text. No guesswork about what words might mean. Just pure mathematical patterns.

This finding devastates simple hoax theories.

A 15th-century forger would need to track word frequencies across hundreds of pages to create this effect. They’d need to ensure certain words appeared frequently in some sections whilst avoiding them in others. They’d need to understand statistical concepts that weren’t formalised until the 20th century.

Statistical Profile: The Central Paradox

Statistical Property	Voynich MS	Natural Language	Rugg's Hoax Text	Random Gibberish
Zipf's Law Adherence A universal rule of word frequency in all known languages.	Strong	Strong	Weak / Partial	None
Information Entropy A measure of randomness. Low entropy implies structure and predictability.	Low (Language-like)	Low	High	Very High
Thematic Clustering The tendency for related keywords to appear in 'bursts' within specific sections.	Strong	Strong	None	None

Conclusion: The Voynich Manuscript's statistical profile matches a natural language, which directly contradicts the evidence for a simple hoax.

The Cipher That Isn’t a Cipher

The manuscript has defeated every codebreaker who tried it.

The list reads like a who’s who of cryptanalysis. Athanasius Kircher, the 17th century’s greatest polymath, failed. So did William Friedman, whose team broke the Japanese PURPLE cipher that helped win World War Two. Modern computer analysis using AI and machine learning has produced nothing verifiable.

This seems to support the cipher theory. Surely only an incredibly sophisticated encryption could resist such expertise. But the manuscript’s properties make this explanation paradoxical.

Good ciphers aim for high entropy. They try to look random because patterns help codebreakers. The Voynich text does the opposite. It’s highly structured, repetitive, and predictable. These are exactly the features cryptanalysts exploit to break codes.

Simple substitution ciphers have been definitively ruled out. In these systems, each symbol represents a letter in a known language. Frequency analysis, the basic tool for breaking such ciphers, should work. It doesn’t. The text’s statistical properties don’t match any known language when you assume simple substitution.

The only cipher explanations that remain are essentially unfalsifiable.

Maybe it uses a codebook where each Voynichese word corresponds to a word in a lost dictionary. Without the codebook, the cipher becomes unbreakable.

Maybe it uses a one-time pad with a random key as long as the message itself.

These explanations can’t be disproven, but they can’t be tested either. They’re dead ends for investigation.

Friedman himself concluded it probably wasn’t a traditional cipher at all. After years of analysis, he believed it was either written in a synthetic language or required a specific codebook that no longer exists. His team’s failure carries weight. These were the people who cracked Enigma variations and Japanese naval codes. If they couldn’t break it using cryptographic methods, perhaps it’s not a cipher.

The solution is not a simple cipher... The basis of the system is a much more primitive type of cryptography, one that leads to an artificial, or synthetic, language.

William F. Friedman, Lead Cryptanalyst, US Signal Intelligence Service (Post-WWII Analysis)

Investigating the Gaps

Several theories attempt to bridge the gap between hoax and language. Each tries to explain how the text can be both meaningless and statistically complex.

An Early Constructed Language?

Artificial languages existed before the Voynich Manuscript. The most famous example is Hildegard of Bingen’s Lingua Ignota from the 12th century. The German abbess created her own alphabet and vocabulary for mystical purposes. This proves the concept wasn’t anachronistic.

But early constructed languages were primitive. Hildegard’s creation was essentially a word list, about a thousand nouns with no original grammar. She wrote German sentences and substituted her invented words for some nouns. It was a naming system, not a true language.

The philosophical languages of the 17th century were the first to attempt complete grammatical systems. John Wilkins’ Real Character tried to categorise all human knowledge and create a logical language to express it. His system was, in cryptanalyst John Tiltman’s words, ‘much too systematic’ compared to the Voynich text’s ‘cumbersome mixture’.

If the Voynich Manuscript is a constructed language, it’s centuries ahead of its time in sophistication. It would need not just vocabulary but complex grammar capable of producing Zipfian distributions and thematic clustering. No known medieval constructed language comes close.

A Lost Natural Language?

This theory fits the statistical evidence perfectly.

The manuscript behaves exactly like a text written in a genuine human language because it is one. We simply don’t know which one.

But decades of comparison to every known language and language family have found no convincing matches. The text’s rigid internal structure and extreme repetition are also unusual. Natural languages tend towards flexibility and variation. The Voynich text feels constrained in ways that natural languages typically aren’t.

Asemic Writing or Glossolalia?

This theory reframes the question entirely.

What if the text looks meaningful because it was created through a meaning-like process, but one that didn’t involve actual language? Asemic writing creates the appearance of text without semantic content. Glossolalia, or speaking in tongues, produces speech-like sounds without linguistic meaning.

The bizarre illustrations support this interpretation. They’re not failed attempts at depicting real plants or astronomical phenomena. They’re creative expressions, perhaps religious or visionary in nature. The lack of corrections suggests someone writing in a flow state, not carefully copying or encoding.

The theory’s weakness is the manuscript’s global structure.

Glossolalia might produce local patterns, phonological rules that make syllables sound language-like. But could it create thematic clustering across 240 pages? Could an unconscious process ensure certain words appeared primarily in the herbal section, whilst others concentrated around the zodiac diagrams? Critics argue this requires too much unconscious organisation.

Statistical analysis reveals another complication.

The manuscript contains at least two distinct ‘dialects’, labelled Voynich A and B by researchers. The text’s statistical properties shift partway through. This suggests either multiple authors or a change in method. Neither fits easily with the glossolalia hypothesis, which assumes a single person’s consistent unconscious expression.

No single theory survives contact with all the evidence. The hoax theory is contradicted by the text's deep statistical structure. The language theory fails to explain the nonsensical illustrations. The cipher theory is invalidated by its own predictable, low-entropy patterns.

Veriarch Investigative Analysis (Evaluation Matrix Summary)

The Unanswered Questions

The Voynich Manuscript isn’t a simple puzzle waiting for the right key. It’s a paradox where evidence for meaninglessness and evidence for meaning are both strong and mutually exclusive.

The core finding of this investigation is the paradox itself.

Gordon Rugg showed how simple mechanical methods could generate meaningless text with Voynichese-like properties. But Marcelo Montemurro proved the actual manuscript has complex statistical structures these simple methods can’t create. The text fails as a competent cipher, yet resists all cryptanalysis. It shows the hallmarks of natural language whilst matching no known tongue.

Future research needs to target specific gaps:

Can a sophisticated hoax algorithm replicate all properties? Rugg’s table-and-grille method is too simple. Could a more complex version, perhaps using multiple tables for different sections, generate both the local word structure and global thematic clustering? This is a testable hypothesis that computer scientists could explore.
What are the statistical profiles of historical constructed languages? Nobody has subjected Hildegard’s Lingua Ignota or other early artificial languages to the same analysis Montemurro applied to the Voynich Manuscript. Do they show Zipfian distributions? Thematic clustering? This baseline data would help evaluate the constructed language theory.
What do the two dialects mean? The shift from Voynich A to B needs systematic investigation. Do the statistical differences match known patterns of dialectal variation? Or do they suggest something else entirely, perhaps a change in generation method or authorship?
Can glossolalia create long-range structure? Modern instances of glossolalia could be recorded, transcribed, and analysed using Montemurro’s methods. Can unconscious processes create thematic clustering across long texts? Without this empirical data, the glossolalia hypothesis remains untested.
What was the actual context of alchemical herbals? Toresella identified the genre but we need deeper investigation. Who created these manuscripts? For what purpose? Understanding the cultural context might explain why someone would create 240 pages of unreadable text with impossible plants.

The real mystery isn’t what the Voynich Manuscript says. It’s how it can simultaneously be a meaningless hoax and a meaningful text. Every theory explaining one aspect contradicts evidence supporting another. The manuscript has created an investigative paradox as elaborate as its fantastical illustrations.

That paradox, not any imagined hidden message, is what makes the Voynich Manuscript genuinely intriguing. It exposes the limits of our categories. Hoax or language? Cipher or gibberish? The manuscript suggests these divisions might be too simple.

Whatever process created those 240 pages of vellum, it produced something that transcends our normal frameworks for understanding text. The Voynich Manuscript remains unread not because we haven’t found the right key, but because we haven’t understood what kind of lock we’re looking at.

Sources

Sources include: peer-reviewed papers in Cryptologia and PLoS ONE on mechanical text generation and information-theoretic analysis; codicological analysis of Beinecke Library MS 408; University of Arizona radiocarbon dating reports; declassified post-WWII analysis from US cryptanalyst William F. Friedman; art-historical studies on the genre of 15th-century Northern Italian alchemical herbals; 17th-century correspondence from the archives of Athanasius Kircher; comparative studies of early constructed languages, including Hildegard of Bingen’s Lingua Ignota and John Wilkins’s Real Character; and foundational research in computational linguistics on Zipf’s Law, information entropy, and semantic network mapping.

The Abbess’s Code – Testing Hildegard’s ‘lingua ignota’

Hildegard of Bingen left a 1,011-word ‘unknown language’ and a distinct alphabet. We test what survives: glossary, hymn and letterforms. Does it scale beyond a naming list, and what do the numbers say?

Cardan Grille Anachronism – Can a Sixteenth-Century Mask Fit the Voynich?

Does the Voynich Manuscript hide text written through a Cardan grille? We test the dates, the device’s history, and the text’s behaviour to see if a sixteenth-century mask could plausibly fit a fifteenth-century book.

The Voynich’s Two Voices – A Field Guide to A and B

Statistical fingerprints in the Voynich Manuscript split the text into two dialect families, Currier A and Currier B. We map where each sits in the book, which hands wrote them, and what that distribution implies.

The Derveni Papyrus – Europe’s Oldest Book and Its 44-Year Secret

Europe's oldest book, a philosophical text arguing for a single god, was found in a warrior's tomb. Kept secret by academics for 44 years, its contents challenge our understanding of ancient Greek religion, science, and ritual.