What Names Can't Tell You

March 19, 2026

If you're evaluating the API, this is probably the most useful thing on the site. I'd rather you know the failure modes before signing up than discover them in production.

Names are statistics, not facts

Query "Maria Garcia" and the nationality endpoint returns something like 30% Mexico, 20% Brazil, 15% USA, plus a tail of other countries. That isn't indecision. Maria Garcia really is common in all those places. The response is a distribution over populations, not a guess about one person.

A confident prediction means the distribution is peaked. "Satoshi Yamamoto" concentrates heavily on Japan. "Maria Garcia" spreads across a dozen countries. Both responses are correct descriptions of where those names appear. Neither tells you where a specific individual is from.

What confidence actually means

A confidence of 0.6 doesn't mean "slightly better than a coin flip." We calibrate confidence scores using isotonic regression against validation data. If we report 0.6, it means that roughly 60% of predictions at that confidence level were correct in our held-out set. The score tracks real accuracy, not just model certainty.

That said, 0.3 is genuinely weak. We flag anything below 0.6 as low_confidence in the response. If you're building downstream logic on our predictions, filter on that flag. Don't treat all results equally.

But here's what the confidence score doesn't capture: it tells you how reliable the prediction is given the model. It doesn't tell you how inherently predictable the name is. A high-confidence wrong answer is still wrong. Our calibration is only as good as our validation data, and that data has its own gaps.

Name ordering will bite you

Vietnamese names put the family name first. "Nguyen Tran" means family name Nguyen, given name Tran. If your system splits on whitespace and passes the first token as forename and the second as surname, you get it backwards. The same applies to Chinese, Japanese, Korean, and Hungarian names.

The API does not detect or correct name ordering. That's on you. Getting it wrong doesn't just degrade results a little. It can flip the prediction entirely. If you're processing names from East or Southeast Asia, sort this out before calling us.

Transliteration chaos

The Russian name Михаил gets romanized as Mikhail, Mihail, Michail, or Mikail depending on the country and transliteration scheme. Each spelling may route to different countries in the training data, because each spelling actually is more common in different places (Mihail skews Romanian, Mikail skews Turkish).

We partially handle this through n-gram fallback. Mikhail and Mikail share enough character n-grams that their Jaccard similarity is high, so the model doesn't treat them as completely different names. But this is a patch. It helps with close spellings and fails on distant ones. "Jianwei" vs. "Chien-wei" won't match at all despite being the same Chinese name in different romanization systems.

Single-source predictions are weaker

Sending only a forename or only a surname loses the agreement signal between the two. "Giovanni" alone is strongly Italian. But "Smith" alone could be US, UK, Australia, or Canada, and the model has no forename to help disambiguate.

Our confidence formula multiplies three factors: coverage, distribution peak, and agreement between forename and surname predictions. With one name missing, agreement defaults to 1.0 (no penalty, but no boost), and coverage drops by roughly half. You'll see noticeably lower confidence scores on single-name queries. That's accurate, not a bug.

Ethnicity is US-only

The ethnicity prediction uses US Census 2010 data. It returns six categories: white, black, hispanic, API (Asian/Pacific Islander), AIAN (American Indian/Alaska Native), and two-or-more-races. These categories are specific to American demographic classification. They don't map to how Brazil, India, or Nigeria thinks about ethnicity.

The data is also from 2010. That's 16 years old now. Demographic distributions shift, immigration changes name pools, and mixed-race identification has grown substantially since then. We use what's available, but you should know the vintage.

Age prediction is narrow

Age estimation comes from SSA (Social Security Administration) baby name popularity trends. "Jennifer" peaked in the 1970s, so the model guesses the bearer is roughly 50. This works for names with clear temporal peaks in English-speaking countries. It works poorly for everything else.

Names that have been consistently popular for centuries produce wide, unhelpful distributions. "John" and "Maria" have been given to babies every decade for hundreds of years. The model returns a broad age range that's technically correct and practically useless. Age prediction also only uses the forename. The surname contributes nothing.

When to trust it anyway

For aggregate analysis over thousands of names, the errors wash out. That's where this kind of tool works best. For individual-level decisions, treat every prediction as a probability and filter on the low_confidence flag.

Some of these limitations I can fix with better data or smarter fallbacks. Others are baked into the approach. A name is a weak signal about a person. Useful in aggregate, unreliable for any individual. I think being clear about that is more important than pretending otherwise.