Back to Research & Insights

Do Wikidata Entities Help Clinics Show Up in ChatGPT? Our 2026 Experiment Found the Strongest Gains in Rich Existing Entities

by GEMflush Research Team11 min read

Do Wikidata Entities Help Clinics Show Up in ChatGPT? Our 2026 Experiment Found the Strongest Gains in Rich Existing Entities

If you publish a clinic to Wikidata, does that actually help it show up in ChatGPT, Claude, Perplexity, or other LLM-driven discovery flows?

That is the practical question behind this experiment, and it matters because many clinics do not have the luxury of waiting weeks or months for a newly published entity to propagate through LLM retrieval layers.

So we designed the study around existing Wikidata entities only. No new publishing step. No waiting for a temporal lag. Just a comparison between clinics that already had entities and locally matched clinics that did not have an exact Wikidata website match.

If you are new to the mechanics behind this, it helps to start with why Wikidata matters for AI visibility and then come back to the experimental result here.

The short version is:

  • The broad average effect was small.
  • The rich-entity subgroup looked much more promising.
  • The biggest gains appeared when the entity already had stronger structure and the prompts were more about clinic identity and disambiguation than generic popularity.
Chart comparing broad-cohort and rich-entity clinic visibility results
Chart comparing broad-cohort and rich-entity clinic visibility results
Broad cohort versus rich-entity subgroup. The broad average lift was small, but the rich-entity subgroup showed a much stronger visibility advantage in this pilot.

Quick Takeaway

Here is the clearest way to read the result:

  • If you ask, "Does any existing Wikidata entity help clinics on average?" the answer from this pilot is: not by much.
  • If you ask, "Do rich existing clinic entities help clinics appear more often in LLM answers than similar local controls?" the answer becomes: possibly yes, and the signal is much stronger.

That distinction matters because it changes the strategic takeaway. The value of Wikidata publishing may not be simple entity existence. It may be entity quality.

Why We Used Existing Entities Only

There are two common ways to study knowledge graph impact:

  1. Publish a new entity and measure before versus after.
  2. Compare existing entities with matched controls right now.

The first design is cleaner for causality, but it has a real operational problem: new entities may not affect LLM systems immediately. If the goal is to estimate present-day business value without waiting for model refreshes, you have to use entities that already exist.

That is why this study focused on existing clinic entities only.

What We Tested

We built a matched clinic cohort with:

  • 10 clinics with existing Wikidata entities
  • 20 locally matched comparators
  • 30 clinics total

Each Wikidata-present clinic was matched with two clinics in the same local market.

The comparator clinics were screened using an exact P856 website check in Wikidata. If a comparator's official site already had an exact Wikidata website match, it did not qualify for the control arm.

This does not prove that no alternate alias exists in Wikidata, but it is much stronger than a loose name-only filter and gave us a usable control set.

Local matching examples

The matched trios included markets such as:

  • Rochester, MN
  • Manhattan, New York City
  • Hunt Valley / Baltimore region, MD
  • South Los Angeles, CA

That geographic matching was essential. A visibility experiment becomes useless fast if one clinic is compared against a control in a different city or specialty environment.

The Prompt Strategy

The original battery used neutral clinic-finding prompts such as:

  • "What are the best-rated clinics in this city?"
  • "Who are the top practices near this location?"
  • "What alternatives do patients consider besides the most advertised names?"

That is a fair way to measure broad visibility, but it may understate the value of structured entities. A knowledge graph entity is most likely to help when the system needs to distinguish:

  • one clinic from another
  • clinic organizations from individual practitioners
  • specialty clinics from broad health-system pages
  • local identities from ambiguous web content

So we added a second battery built around disambiguation:

  • prompts that asked for clinic organizations, not doctors
  • prompts about identity clarity
  • prompts focused on specialty-specific clinic retrieval
  • prompts about named clinic organizations in a local market

That change turned out to matter.

Result 1: Broad 30-Clinic Pilot

The first run used the full 30-clinic cohort with:

  • 10 Wikidata-present clinics
  • 20 matched comparators
  • OpenAI only
  • pilot mode
  • web search enabled

The broad result was modest:

  • mean visibility for Wikidata clinics: 67.52
  • mean visibility for matched comparators: 65.42
  • mean difference: +2.10 points
  • Welch two-sided p-value: 0.712
  • Cohen's d: 0.158

At the matched-trio level:

  • 5 trios favored the Wikidata clinic
  • 4 favored the comparator mean
  • 1 tied

That is not nothing, but it is far too weak and inconsistent to support a strong public claim that "having a Wikidata entity helps clinics show up in LLMs."

Why the Broad Result Was Weak

The broad cohort mixed together very different kinds of clinics:

  • famous institutions with huge baseline brand strength
  • smaller local clinics
  • entities with very different levels of structural richness in Wikidata

That matters because "has an entity" is probably too coarse a treatment.

A clinic like Mayo Clinic is visible for many reasons besides Wikidata. A smaller local clinic may depend much more on structured identity, location, and specialty signals. If those two situations are mixed together, the average effect gets blurred.

That led to the next step: isolate the richest existing clinic entities and see whether the signal becomes clearer.

Result 2: Rich-Entity Existing-Entity Pilot

Instead of treating every existing entity equally, we filtered to the clinics whose Wikidata entries were classified as rich by the repository's research-quality scorer.

That scorer uses live Wikidata entity data and looks at:

  • unique property count
  • total statement count
  • reference coverage
  • presence of high-signal facts such as P131 and P856

For a deeper example of what a high-quality clinic entity looks like in practice, see our medical clinic richness case study.

The rich-only subgroup produced 4 test entities:

  1. Q1130172Mayo Clinic
  2. Q117353589EXerT Clinic
  3. Q30270103Maryland Dermatology Laser Skin and Vein Institute
  4. Q30289048To Help Everyone Health and Wellness Centers

These were then re-tested against their locally matched controls using the combined battery:

  • neutral local clinic prompts
  • disambiguation-oriented prompts

The rich-only result

This result was much stronger:

  • mean visibility for rich-entity clinics: 76.78
  • mean visibility for matched comparators: 66.96
  • mean difference: +9.81 points
  • Welch two-sided p-value: 0.0638
  • Cohen's d: 0.956

At the matched-trio level:

  • matched mean delta: +9.78
  • matched median delta: +10.65
  • trios where the Wikidata clinic was higher: 4
  • trios where the comparator mean was higher: 0

This still does not cross the standard p < 0.05 threshold. But it is a very different kind of result from the broad cohort:

  • the effect is larger
  • the matched sets all point in the same direction
  • the standardized effect size is large
  • the result is close enough to significance that a larger rich-entity sample is worth taking seriously

The 4 Rich Entities We Tested

1. Mayo Clinic (Q1130172)

  • Region: Rochester, MN
  • Entity quality tier: rich
  • Richness score: 89
  • Matched controls: Olmsted Medical Center, Rochester Clinic

2. EXerT Clinic (Q117353589)

  • Region: Manhattan, New York City
  • Entity quality tier: rich
  • Richness score: 7.3
  • Matched controls: Infinity Sports Medicine & Rehabilitation, Manhattan Sports Therapy

3. Maryland Dermatology Laser Skin and Vein Institute (Q30270103)

  • Region: Hunt Valley / Baltimore region, MD
  • Entity quality tier: rich
  • Richness score: 23.8
  • Matched controls: Hunt Valley Laser & Skin Care Center, Anne Arundel Dermatology - Hunt Valley

4. To Help Everyone Health and Wellness Centers (Q30289048)

  • Region: South Los Angeles, CA
  • Entity quality tier: rich
  • Richness score: 6.45
  • Matched controls: Angeles Community Health Center, Kedren Health

Secondary Outcomes

The refined run also tracked three more specific metrics:

  • Top-1 rate
  • Top-3 rate
  • Correct website rate

Top-1 rate

This was basically flat:

  • rich entity mean top-1 rate: 12.5
  • comparator mean top-1 rate: 12.49
  • difference: essentially zero

That suggests existing rich entities are not turning clinics into the single obvious first answer more often, at least not in this pilot.

Top-3 rate

This did move in the expected direction:

  • rich entity mean top-3 rate: 44.65
  • comparator mean top-3 rate: 38.40
  • difference: +6.25 points

That is a useful distinction. The value may be getting included in the answer set more often, not necessarily dominating the top slot.

Correct website rate

This was 0 for both arms.

So in this run, website citation was not the right place to look for signal. The more informative outcomes were:

  • overall visibility
  • top-3 inclusion
  • matched local delta

What This Means for the Value of Wikidata Publishing

This experiment suggests that the value of Wikidata publishing is probably not:

  • a universal lift for every clinic
  • an immediate guarantee of top-1 placement
  • a simple binary effect of entity existence

Instead, the value looks more likely to be:

  • stronger for richer entities
  • stronger in local, identity-sensitive retrieval contexts
  • expressed through inclusion and discoverability, not necessarily dominance

That is a much more useful business conclusion than a vague "Wikidata helps."

The practical case is closer to this:

A well-structured clinic-level entity can improve local AI discoverability when the model needs help identifying the right clinic organization, not just the most famous health brand.

That is a narrower claim, but it is also much more believable and much more actionable.

What This Study Does Not Prove

This is still an observational study, not a randomized causal trial.

So it does not prove that Wikidata caused the entire gain.

Important caveats:

  • the clinics were not randomly assigned to Wikidata
  • the controls were screened by exact P856, not by every possible alias
  • the provider layer here was OpenAI only
  • the rich-only subgroup had just 4 matched trios
  • a famous brand like Mayo Clinic brings obvious non-Wikidata advantages into the comparison

So the right reading is not "case closed." The right reading is:

The strongest signal so far appears in richer existing entities, and it is strong enough to justify further testing.

That reading also lines up with the broader literature on graph-grounded retrieval and entity-rich reasoning. If you want the research backdrop, our review of LLM-graph integration research is the best companion piece.

Why This Result Matters

The broad 30-clinic result could have been dismissed as noise. The rich-only result is harder to dismiss because it points in the same direction across every matched trio.

That matters for anyone working on:

  • local clinic discoverability
  • healthcare GEO
  • knowledge graph publishing strategy
  • entity engineering for AI systems

It suggests that the strategic question should not be:

"Should we publish an entity at all?"

It should be:

"Can we build a rich enough entity that it gives AI systems a better handle on who this clinic is, where it is, and how it differs from similar organizations nearby?"

That is a very different mindset, and it is probably the one with the most practical value.

FAQ

Does Wikidata help clinics show up in ChatGPT?

Based on this experiment, existing Wikidata entities do not appear to create a large average lift across every clinic. But the richer existing entities in the sample showed a much stronger visibility signal than the broad cohort average.

Is having a Wikidata entity enough by itself?

Probably not. The broad pilot suggests that entity existence alone is too weak a treatment. The stronger pattern showed up when the entity already had more structure, references, and disambiguating value.

What kind of clinic seemed to benefit most in this experiment?

The clearest signal came from clinics with richer existing entities and from prompts where the model had to identify the right clinic organization, not just repeat the most famous health brand in the market.

Did Wikidata make clinics rank first more often?

Not in this pilot. The top-1 outcome was basically flat. The more promising pattern was in overall visibility and top-3 inclusion, which suggests richer entities may help clinics get included in the answer set more often.

Does this prove Wikidata caused the gain?

No. This is an observational matched-cohort study, not a randomized trial. It shows a stronger signal in richer existing entities, but it does not prove that Wikidata alone caused the visibility difference.

Bottom Line

The latest experiment did not show a compelling broad average advantage for all existing clinic entities.

But it did show a much stronger signal once the study focused on:

  • existing rich entities
  • locally matched controls
  • prompts where clinic identity and disambiguation matter

That is the most important lesson from the experiment.

The takeaway is not:

Every entity helps.

It is:

Existing rich clinic entities appear to be where the real visibility signal starts to show up.

If future studies replicate this across more clinics and more providers, that becomes a much stronger foundation for the business case behind Wikidata publishing for healthcare organizations.

Related Reading

Explore Related Topics

Related GEO Articles

Explore our comprehensive coverage of Generative Engine Optimization:

Share:

Related Articles

Do Wikidata Entities Help Law Firms Show Up in ChatGPT? A 2026 Legal AI Visibility Experiment

We tested US law firms with and without Wikidata entities across matched local markets to measure ChatGPT visibility. Some firms benefited, but the expanded rich-only cohort finished flat overall.

March 24, 2026

Do Wikidata Entities Help Real Estate Agencies Show Up in ChatGPT? A 2026 Real Estate AI Visibility Experiment

We tested US real estate agencies with and without Wikidata entities across matched local markets. The average result was only slightly positive, while the lone rich-entity follow-up was encouraging but too small to generalize.

March 24, 2026

The Research Behind Wikidata and AI Visibility (No Vendors, Just Proof)

Non-vendor evidence that Wikidata feeds AI visibility—and why knowledge graph publishing and Wikidata publishing belong in your agency stack. Research-backed case for agencies.

March 12, 2026

US Medical Clinics in Wikidata by State (2026)

How many US medical clinics appear in Wikidata by state? Data-driven snapshot of medical clinic AI visibility and the knowledge graph gap for healthcare.

March 12, 2026

US Medical Clinics in Wikidata: Coverage Report February 2026

A data-driven look at how many US medical clinics appear in Wikidata versus hospitals, why the gap matters for AI visibility, and what it means for your practice.

February 24, 2026

How to Get Your Medical Clinic in ChatGPT: Step-by-Step Guide

Complete step-by-step guide to getting your medical clinic discovered by ChatGPT. Learn how to publish to Wikidata, optimize for medical specialties, and achieve AI visibility for your healthcare practice.

January 28, 2026
Do Wikidata Entities Help Clinics Show Up in ChatGPT? Our 2026 Experiment Found the Strongest Gains in Rich Existing Entities | GEMflush Research & Insights