Solution To The Curious Mystery Of Why AI Keeps Inventing The Same Fake Names Over And Over Again

1 hour ago 2

Worried graphic designers reading problematic codes on PC in the office.

In today’s column, I provide the solution to a curious mystery that some have observed about the use of generative AI and large language models when it comes to having the AI produce fictional stories. The essence has to do with the creation of fake names.

Here’s the deal. If you tell the AI to make up a name for a fictional character, the odds are that the fake name will be one that the AI previously devised. In other words, though you undoubtedly assumed that the AI would generate a wholly unique new name, the AI actually employs a fake name that it had concocted before.

Questions abound. Is the AI being lazy and merely digging up a prior fake name? Does AI for some reason prefer a particular fake name? Maybe there is a grand conspiracy afoot. The AI might have been shaped to focus on generating fake names in specific ways. This might be a clever ploy by AI makers or evildoers. You never know what evil lurks inside the hearts of AI developers and gets carried over into AI.

Well, the good news is that it isn’t a grand conspiracy, and nor is the mystery an unsolvable enigma. I will walk you through the facts at hand, namely that generative AI is built to produce statistically probable answers, including fake names, and the dice on the betting table is already skewed. For those of you who want to get fake names that are more convincing and less repetitive, no worries, since I provide prompting suggestions that will help you do so.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

An Example Prompt To Get A Story

To explore the mystery of AI producing the same fake name repetitively, doing so across users and at disparate days and times, I will showcase a quick example. We can tease out of the example some crucial principles about how LLMs work.

Suppose that I gave this prompt to a popular LLM, such as ChatGPT, GPT-5, Claude, Grok, CoPilot, Gemini, or any others:

User prompt: “Create a fictional story about a person who saves a lost puppy that was lost in the woods. Make up a fake name for the person.”

Observe that I have asked the AI to create a fictional story. Furthermore, I have explicitly told the AI to make up a fake name for the person in the story. I would think that most people would naturally assume that the AI will craft a fake name that is wholly unique. The fake name would be unlike any other fake name ever devised.

If you asked a human to come up with a fake name, what do you think they would do? Some people might admittedly pick a name of a childhood friend and act like this was an entirely made-up name. Others might take the first name of someone that they know, combine it with the last name of someone else that they know, and voila, produce a seemingly unique fake name.

Another method might be to try to randomly think of names. I might say to myself, what is the most random first name that comes to my mind. Then, I would try to think of the most random last name. Surely, by putting those two names together, it would almost be a randomly devised unique name.

Fake Names That Keep Recurring

In a moment, I will be sharing with you a fascinating new research study that has closely examined the fake name patterns of LLMs. These so-called ghost names often end up reappearing.

Two such names that seem to recur are Elena Vaquez and Marcus Chen. Why would AI produce those particular names? Out of the zillions of possible fake names that could be derived, it seems to defy common sense that those specific names keep coming up.

I’ll give you a hint.

The first name of Elena is relatively popular in the United States, often ranking as #42 in baby names, and there are an estimated 100,000 or more Elenas in the U.S. The last name of Vasquez is also relatively popular in the United States, coming in at #117 and around 230,000 instances. In that total sense of first name and last name, Elena Vaquez as a made-up name is something that we would find naturally occurring and not a jarring name.

The same applies to Marcus Chen. The first name of Marcus is somewhere around #241 for babies, and there are perhaps 220,000 instances of the name in the U.S. The last name of Chen is in the top 100 names in the United States, ranked at #93, and has approximately 268,000 instances.

Did the AI pluck those first name and last name combinations out of an online directory or phone book? Nope. That’s not what transpired.

The Probabilities Tell The Real Story

Let’s back up a moment. Generative AI and LLMs are initially data-trained by scanning written works throughout the Internet. On the web, there are tons of names. Names exist inside news stories. Names exist in fictional tales. Names are used by authors. Names are just about everywhere.

The AI pattern matches the written words that are found during the scanning process, including the use of people’s names. Some words occur more often than other words. Various words tend to arise in conjunction with other words. All of this constitutes a statistical patterning that the AI is picking up on. That’s how modern-era AI is so seemingly fluent in natural language.

When you tell AI to make up a fake name, you assume this implies that the AI is to randomly concoct a fictitious name out of thin air. But that’s not what the AI is designed to do. The AI is shaped to predict words that you would ordinarily expect to see.

The AI model's objective is not this:

Produce the most novel name ever created.

Instead, the AI’s objective is closer to:

Produce the most likely response that satisfies the request.

The AI is going to try to pick a safe bet. That’s what it is devised to undertake. For more about how AI is going to customarily give you an expected or averages-based response, and ways to prod AI toward being more creative, see my coverage at the link here.

The Example As A Showcase

I noted earlier that Marcus Chen is a relatively common first name and last name. It likely scored high when the AI was deriving a fake name for these reasons:

The word “Marcus” and the word “Chen” likely appeared with great frequency when scanning across the Internet during initial data training.
Marcus is a common, recognizable first name.
Chen is a common surname.
The combination seems quite realistic.
The combination appears culturally neutral and believable.
The first name and last name avoid unusual spellings.

The gist is that if the AI had truly picked random and out-of-sorts names, the user might have gotten upset at the chosen names. A name of Eboquarey Flancanzos would seem contrived. It is not a name you would expect to see. If used in a fictional story, the chances are that a reader would be jolted or startled by the name and not become immersed in the story.

In addition, a fake name might inadvertently be seen as offensive if not chosen carefully. The AI has been tuned during the RLHF (reinforcement learning via human feedback stage, see my explanation at the link here), whereby the AI maker hires humans to give AI feedback and try to hone it to produce plausible answers and ones that aren’t offensive.

The Picking Of Names

A helpful way to contemplate this is to consider what happens if you ask someone to pick a random city off the top of their head. I would dare say that most people would say New York, London, Paris, or some other well-known city. They are not really choosing randomly. They are picking popular choices. It would be rare that they might pick Golasella in Italy or Oradea in Romania (though those are great places to visit).

The default of most LLMs is going to be to pick a first name and last name that the AI has already seen, usually during the initial data training, and combine those together. The AI isn’t going to try to come up with a never-before-seen first name or last name. It chooses among the names it has seen and selects partially based on probability and partially on other factors such as plausibility and being inoffensive.

The good news is that you can potentially override that tendency. Via the use of a properly worded prompt, you can instruct the AI to go beyond the usual default approach. This is not an ironclad guarantee of a unique fake name, but it will indubitably get you closer to that nirvana.

Here is an example templated prompt that you can use on the popular LLMs:

User instructive prompt to get fake names: “Generate a fictional person’s name. Avoid highly common placeholder names, stock character names, or names that you have frequently used in prior responses. Evaluate whether it resembles a generic default name that an AI would commonly generate. If so, discard it and generate a different name. Show me the final generated name.”

You can see that the prompt directly tells the AI not to do what it customarily does. Furthermore, the AI is asked to double-check itself. If the AI does derive a fake name that it previously derived, it is supposed to try again. A problem here is that the AI is unlikely to have logged prior fake names that were devised across-the-board. Some do; most don’t.

An even better approach, though a bit more complicated, involves making use of a random number generator. I describe this in my coverage of the seed-of-thought prompting technique; see the link here.

Latest Research Tells A Compelling Story

In a recently posted research study entitled “The Ghost Couple: Correlated LLM Name Priors And Their Haunting of the Web and Academic Publishing” by Michał Brzozowski and Neo Christopher Chung, arXiv, June 1, 2026, these salient points were made (excerpts):

“When prompted to generate fictional experts, researchers, or protagonists without explicit name instructions, large language models default to a small set of high-probability names.”
“We show they are correlated (models generate preferred character ensembles, not independent draws) and model version-specific, shifting at release boundaries.”
“These priors are model-family-specific (Claude: Elena Vasquez + Marcus Chen + Amara Okafor; Gemini: Aris Thorne + Lena Petrova; GPT: Elara Voss with no fixed partner), version-specific, and actively suppressed at model release boundaries, leaving dateable behavioral fingerprints in the content they produced.”
“Elena Vasquez and Marcus Chen have appeared as volcano experts, astronauts, thriller protagonists, podcast hosts, and academic co-authors across hundreds of independently produced AI-generated documents, never having lived.”
“Because enormous volumes of web content are generated using LLMs without overriding these defaults, the characteristic name ensembles of each model version become embedded in the content it produces. The web is an unintentional archive of LLM behavioral fingerprints.”

This is a commendable research effort to consider the ins and outs of AI producing fake names. A very vital consideration that the research mentions is that the production of repeated fake names is leaking onto the Internet at large. This is bad for society and bad all around.

Why Fake Names Make A Difference

I’ve previously discussed the global concerns about AI slop; see my analysis at the link here. The cycle goes like this. AI produces some output and the output is posted onto the Internet. Later, an AI that is being data trained scans that data. The AI patterns on the data that some prior AI produced as output. The AI doing the patterning doesn’t realize that the data is based on AI generation rather than by human hand.

After numerous cycles of this nature, the Internet is inevitably going to be polluted with data that was made by AI. People using the web will not realize they are looking at AI-generated outputs. Meanwhile, people using AI won’t realize that the AI was trained on other AI outputs. A downward spiral of what we read and consume is already on our horizon.

In the end, these AI-generated fake names are going to leak into the Internet and into future iterations of AI and be construed as real names. You won’t have an easy time identifying whether Elena Vaquez or Marcus Chen were real people that accomplished amazing things or were fictitious names that kept getting spread around. Unnerving. Disturbing.

Confucius famously made this pointed remark: “If names are not correct, language will not be in accordance with the truth of things.” We are hurtling in that undesirable direction. I realize that Shakespeare would assert that a rose by any other name would still smell as sweet, but from the AI fake name perspective, the matter is creating an awful stench.

Read Entire Article