Why AI Texts Use So Many Em Dashes — and What That Reveals About Language Models
AI-generated writing is often recognizable by its punctuation. The em dash — that long, dramatic line — appears everywhere. Discover why language models use it so often and what this means for writers and businesses.
Reading time: 5 minutes
Why AI Texts Use So Many Em Dashes
Many people can spot AI-generated text by one simple clue: unusual punctuation. If you suddenly see a sentence with a striking dash, readers often think, “this feels like AI.” That’s not nitpicking — it shows how small stylistic details can influence the way we read and how much we trust a text. But why exactly do language models use so many em dashes?
This article explains, in plain terms, which theories exist, which ones are less convincing, and why the change in training data — the digitization of older books — seems the most likely explanation.
One Sentence, Two Punctuation Marks — A World of Difference
With an em dash: She loved silence — and he loved music.
With a comma: She loved silence, and he loved music.
The difference is small, but the rhythm and tone shift instantly.
To many readers, the version with ‘—’ feels immediately different — more “AI-like” or even “literary.”
That subtle effect is exactly what stands out — and why punctuation plays such a big role in how we recognize AI writing style.
The Three Main Explanations for AI’s Em Dash Obsession
1. Structural Theory: “AI Finds Em Dashes Convenient”
Some argue that predictive models (like LLMs) find em dashes ‘useful’ because they keep sentence options open or save tokens. An em dash can indicate both a continuation and a pause.
That sounds reasonable, but it’s not convincing. Other punctuation marks are equally flexible — and older models (like GPT-3.5) used em dashes far less. So why would only newer models suddenly develop this preference?
👉 Want to know more about how language models predict words? Read our article on AI Hallucination.
2. RLHF Theory: Human Reviewers and Style Preferences
During the final training phase, RLHF (Reinforcement Learning with Human Feedback), human testers evaluate model responses and reward those that read well.
A popular theory suggests these reviewers — often based in English-speaking countries with lower living costs — unconsciously introduce their own local English style into the model. That could explain why AI tends to use more em dashes.
However, the data doesn’t fully support this. Studies of Nigerian and African English corpora show no higher frequency of em dashes. So while human preferences might play a small role (for instance, in tone), it’s likely not the main cause.
3. Training Data Theory: Old Books Full of Em Dashes
The most convincing explanation: AI models are learning to write from old books.
Between 2022 and 2024, the datasets used to train LLMs changed significantly. They previously consisted mostly of internet text and modern sources. But as AI labs sought higher-quality data, they began digitizing books — many from the 19th and early 20th centuries.
During that time, em dashes were extremely popular: studies show a peak around 1860, when roughly 0.26% of all characters were em dashes.
When models train on such texts, they subconsciously learn that the em dash signals “quality” or “literary” writing. That habit then persists in newer models.
In short:
If a large portion of your “high-quality” training data comes from books full of em dashes, the model learns: “An em dash = a mark of good writing.”
Why Other Theories Are Weaker
Token efficiency The idea that em dashes are more efficient is unconvincing — ordinary punctuation often does the same job with fewer words.
Errors in training rules Mistakes in training rules or platform conventions may confuse dashes with em dashes, but they don’t explain why AI uses them so frequently.
AI trained on AI output AI systems increasingly train on the output of other models. This may reinforce em dash use, but it doesn’t explain why the jump from rare to frequent use happened so abruptly.
What This Means for Writers and Businesses
AI really does use em dashes more often than humans do. That’s not inherently bad, but it’s something to watch out for when publishing text under your organization’s name.
Three practical tips:
- Review AI output. Overusing em dashes can leave an “AI fingerprint.”
- Personalize your AI models. Adjust your settings so the model writes “clearly and without em dashes.”
- Use AI as an assistant, not an author. A human editor ensures your tone remains consistent and trustworthy.
Quick tip:
Settings > Personalization > Custom Instructions
👉 Check out this guide for more details.
Conclusion: Old Books, New Habits
The strongest theory is that the rise of em dashes is a side effect of changing training data — more digitized books from older eras, when the em dash was simply part of normal writing.
That explains why earlier models used them less — and why modern AI models suddenly seem more “stylish” in their punctuation choices.
Still, this remains partly speculative. There are strong indicators and examples (such as the high number of em dashes in Moby-Dick), but no confirmed explanation from inside the major AI labs — yet.
🔍 Want to learn more about AI language and writing style? Visit lumans.ai/blog to explore how AI can enhance your communication in practical, reliable ways.