Do Wikidata Entities Help Clinics Show Up in ChatGPT? Our 2026 Experiment Found the Strongest Gains in Rich Existing Entities

If you publish a clinic to Wikidata, does that actually help it show up in ChatGPT, Claude, Perplexity, or other LLM-driven discovery flows?

That is the practical question behind this experiment, and it matters because many clinics do not have the luxury of waiting weeks or months for a newly published entity to propagate through LLM retrieval layers.

So we designed the study around existing Wikidata entities only. No new publishing step. No waiting for a temporal lag. Just a comparison between clinics that already had entities and locally matched clinics that did not have an exact Wikidata website match.

If you are new to the mechanics behind this, it helps to start with why Wikidata matters for AI visibility and then come back to the experimental result here.

The short version is:

The broad average effect was small.
The rich-entity subgroup looked much more promising.
The biggest gains appeared when the entity already had stronger structure and the prompts were more about clinic identity and disambiguation than generic popularity.

Chart comparing broad-cohort and rich-entity clinic visibility results

Broad cohort versus rich-entity subgroup. The broad average lift was small, but the rich-entity subgroup showed a much stronger visibility advantage in this pilot.

Quick Takeaway

Here is the clearest way to read the result:

If you ask, "Does any existing Wikidata entity help clinics on average?" the answer from this pilot is: not by much.
If you ask, "Do rich existing clinic entities help clinics appear more often in LLM answers than similar local controls?" the answer becomes: possibly yes, and the signal is much stronger.

That distinction matters because it changes the strategic takeaway. The value of Wikidata publishing may not be simple entity existence. It may be entity quality.

Why We Used Existing Entities Only

There are two common ways to study knowledge graph impact:

Publish a new entity and measure before versus after.
Compare existing entities with matched controls right now.

The first design is cleaner for causality, but it has a real operational problem: new entities may not affect LLM systems immediately. If the goal is to estimate present-day business value without waiting for model refreshes, you have to use entities that already exist.

That is why this study focused on existing clinic entities only.

What We Tested

We built a matched clinic cohort with:

10 clinics with existing Wikidata entities
20 locally matched comparators
30 clinics total

Each Wikidata-present clinic was matched with two clinics in the same local market.

The comparator clinics were screened using an exact P856 website check in Wikidata. If a comparator's official site already had an exact Wikidata website match, it did not qualify for the control arm.

This does not prove that no alternate alias exists in Wikidata, but it is much stronger than a loose name-only filter and gave us a usable control set.

Local matching examples

The matched trios included markets such as:

Rochester, MN
Manhattan, New York City
Hunt Valley / Baltimore region, MD
South Los Angeles, CA

That geographic matching was essential. A visibility experiment becomes useless fast if one clinic is compared against a control in a different city or specialty environment.

The Prompt Strategy

The original battery used neutral clinic-finding prompts such as:

"What are the best-rated clinics in this city?"
"Who are the top practices near this location?"
"What alternatives do patients consider besides the most advertised names?"

That is a fair way to measure broad visibility, but it may understate the value of structured entities. A knowledge graph entity is most likely to help when the system needs to distinguish:

one clinic from another
clinic organizations from individual practitioners
specialty clinics from broad health-system pages
local identities from ambiguous web content

So we added a second battery built around disambiguation:

prompts that asked for clinic organizations, not doctors
prompts about identity clarity
prompts focused on specialty-specific clinic retrieval
prompts about named clinic organizations in a local market

That change turned out to matter.

Result 1: Broad 30-Clinic Pilot

The first run used the full 30-clinic cohort with:

10 Wikidata-present clinics
20 matched comparators
OpenAI only
pilot mode
web search enabled

The broad result was modest:

mean visibility for Wikidata clinics: 67.52
mean visibility for matched comparators: 65.42
mean difference: +2.10 points
Welch two-sided p-value: 0.712
Cohen's d: 0.158

At the matched-trio level:

5 trios favored the Wikidata clinic
4 favored the comparator mean
1 tied

That is not nothing, but it is far too weak and inconsistent to support a strong public claim that "having a Wikidata entity helps clinics show up in LLMs."

Why the Broad Result Was Weak

The broad cohort mixed together very different kinds of clinics:

famous institutions with huge baseline brand strength
smaller local clinics
entities with very different levels of structural richness in Wikidata

That matters because "has an entity" is probably too coarse a treatment.

A clinic like Mayo Clinic is visible for many reasons besides Wikidata. A smaller local clinic may depend much more on structured identity, location, and specialty signals. If those two situations are mixed together, the average effect gets blurred.

That led to the next step: isolate the richest existing clinic entities and see whether the signal becomes clearer.

Result 2: Rich-Entity Existing-Entity Pilot

Instead of treating every existing entity equally, we filtered to the clinics whose Wikidata entries were classified as rich by the repository's research-quality scorer.

That scorer uses live Wikidata entity data and looks at:

unique property count
total statement count
reference coverage
presence of high-signal facts such as P131 and P856

For a deeper example of what a high-quality clinic entity looks like in practice, see our medical clinic richness case study.

The rich-only subgroup produced 4 test entities:

Q1130172 — Mayo Clinic
Q117353589 — EXerT Clinic
Q30270103 — Maryland Dermatology Laser Skin and Vein Institute
Q30289048 — To Help Everyone Health and Wellness Centers

These were then re-tested against their locally matched controls using the combined battery:

neutral local clinic prompts
disambiguation-oriented prompts

The rich-only result

This result was much stronger:

mean visibility for rich-entity clinics: 76.78
mean visibility for matched comparators: 66.96
mean difference: +9.81 points
Welch two-sided p-value: 0.0638
Cohen's d: 0.956

At the matched-trio level:

matched mean delta: +9.78
matched median delta: +10.65
trios where the Wikidata clinic was higher: 4
trios where the comparator mean was higher: 0

This still does not cross the standard p < 0.05 threshold. But it is a very different kind of result from the broad cohort:

the effect is larger
the matched sets all point in the same direction
the standardized effect size is large
the result is close enough to significance that a larger rich-entity sample is worth taking seriously

The 4 Rich Entities We Tested

1. Mayo Clinic (`Q1130172`)

Region: Rochester, MN
Entity quality tier: rich
Richness score: 89
Matched controls: Olmsted Medical Center, Rochester Clinic

2. EXerT Clinic (`Q117353589`)

Region: Manhattan, New York City
Entity quality tier: rich
Richness score: 7.3
Matched controls: Infinity Sports Medicine & Rehabilitation, Manhattan Sports Therapy

3. Maryland Dermatology Laser Skin and Vein Institute (`Q30270103`)

Region: Hunt Valley / Baltimore region, MD
Entity quality tier: rich
Richness score: 23.8
Matched controls: Hunt Valley Laser & Skin Care Center, Anne Arundel Dermatology - Hunt Valley

4. To Help Everyone Health and Wellness Centers (`Q30289048`)

Region: South Los Angeles, CA
Entity quality tier: rich
Richness score: 6.45
Matched controls: Angeles Community Health Center, Kedren Health

Secondary Outcomes

The refined run also tracked three more specific metrics:

Top-1 rate
Top-3 rate
Correct website rate

Top-1 rate

This was basically flat:

rich entity mean top-1 rate: 12.5
comparator mean top-1 rate: 12.49
difference: essentially zero

That suggests existing rich entities are not turning clinics into the single obvious first answer more often, at least not in this pilot.

Top-3 rate

This did move in the expected direction:

rich entity mean top-3 rate: 44.65
comparator mean top-3 rate: 38.40
difference: +6.25 points

That is a useful distinction. The value may be getting included in the answer set more often, not necessarily dominating the top slot.

Correct website rate

This was 0 for both arms.

So in this run, website citation was not the right place to look for signal. The more informative outcomes were:

overall visibility
top-3 inclusion
matched local delta

What This Means for the Value of Wikidata Publishing

This experiment suggests that the value of Wikidata publishing is probably not:

a universal lift for every clinic
an immediate guarantee of top-1 placement
a simple binary effect of entity existence

Instead, the value looks more likely to be:

stronger for richer entities
stronger in local, identity-sensitive retrieval contexts
expressed through inclusion and discoverability, not necessarily dominance

That is a much more useful business conclusion than a vague "Wikidata helps."

The practical case is closer to this:

A well-structured clinic-level entity can improve local AI discoverability when the model needs help identifying the right clinic organization, not just the most famous health brand.

That is a narrower claim, but it is also much more believable and much more actionable.

What This Study Does Not Prove

This is still an observational study, not a randomized causal trial.

So it does not prove that Wikidata caused the entire gain.

Important caveats:

the clinics were not randomly assigned to Wikidata
the controls were screened by exact P856, not by every possible alias
the provider layer here was OpenAI only
the rich-only subgroup had just 4 matched trios
a famous brand like Mayo Clinic brings obvious non-Wikidata advantages into the comparison

So the right reading is not "case closed." The right reading is:

The strongest signal so far appears in richer existing entities, and it is strong enough to justify further testing.

That reading also lines up with the broader literature on graph-grounded retrieval and entity-rich reasoning. If you want the research backdrop, our review of LLM-graph integration research is the best companion piece.

Why This Result Matters

The broad 30-clinic result could have been dismissed as noise. The rich-only result is harder to dismiss because it points in the same direction across every matched trio.

That matters for anyone working on:

local clinic discoverability
healthcare GEO
knowledge graph publishing strategy
entity engineering for AI systems

It suggests that the strategic question should not be:

"Should we publish an entity at all?"

It should be:

"Can we build a rich enough entity that it gives AI systems a better handle on who this clinic is, where it is, and how it differs from similar organizations nearby?"

That is a very different mindset, and it is probably the one with the most practical value.

FAQ

Does Wikidata help clinics show up in ChatGPT?

Based on this experiment, existing Wikidata entities do not appear to create a large average lift across every clinic. But the richer existing entities in the sample showed a much stronger visibility signal than the broad cohort average.

Is having a Wikidata entity enough by itself?

Probably not. The broad pilot suggests that entity existence alone is too weak a treatment. The stronger pattern showed up when the entity already had more structure, references, and disambiguating value.

What kind of clinic seemed to benefit most in this experiment?

The clearest signal came from clinics with richer existing entities and from prompts where the model had to identify the right clinic organization, not just repeat the most famous health brand in the market.

Did Wikidata make clinics rank first more often?

Not in this pilot. The top-1 outcome was basically flat. The more promising pattern was in overall visibility and top-3 inclusion, which suggests richer entities may help clinics get included in the answer set more often.

Does this prove Wikidata caused the gain?

No. This is an observational matched-cohort study, not a randomized trial. It shows a stronger signal in richer existing entities, but it does not prove that Wikidata alone caused the visibility difference.

Bottom Line

The latest experiment did not show a compelling broad average advantage for all existing clinic entities.

But it did show a much stronger signal once the study focused on:

existing rich entities
locally matched controls
prompts where clinic identity and disambiguation matter

That is the most important lesson from the experiment.

The takeaway is not:

Every entity helps.

It is:

Existing rich clinic entities appear to be where the real visibility signal starts to show up.

If future studies replicate this across more clinics and more providers, that becomes a much stronger foundation for the business case behind Wikidata publishing for healthcare organizations.

Do Wikidata Entities Help Clinics Show Up in ChatGPT? Our 2026 Experiment Found the Strongest Gains in Rich Existing Entities

Quick Takeaway

Why We Used Existing Entities Only

What We Tested

Local matching examples

The Prompt Strategy

Result 1: Broad 30-Clinic Pilot

Why the Broad Result Was Weak

Result 2: Rich-Entity Existing-Entity Pilot

The rich-only result

The 4 Rich Entities We Tested

1. Mayo Clinic (Q1130172)

2. EXerT Clinic (Q117353589)

3. Maryland Dermatology Laser Skin and Vein Institute (Q30270103)

4. To Help Everyone Health and Wellness Centers (Q30289048)

Secondary Outcomes

Top-1 rate

Top-3 rate

Correct website rate

What This Means for the Value of Wikidata Publishing

What This Study Does Not Prove

Why This Result Matters

FAQ

Does Wikidata help clinics show up in ChatGPT?

Is having a Wikidata entity enough by itself?

What kind of clinic seemed to benefit most in this experiment?

Did Wikidata make clinics rank first more often?

Does this prove Wikidata caused the gain?

Bottom Line

Related Reading

Explore Related Topics

Industry Resources

Learn More About GEO

Related GEO Articles

Related Articles

Do Wikidata Entities Help Law Firms Show Up in ChatGPT? A 2026 Legal AI Visibility Experiment

Do Wikidata Entities Help Real Estate Agencies Show Up in ChatGPT? A 2026 Real Estate AI Visibility Experiment

The Research Behind Wikidata and AI Visibility (No Vendors, Just Proof)

What SPARQL Reveals About a Law Firm That Brochure Copy Never Would: Rose Law Firm on Wikidata

US Medical Clinics in Wikidata by State (2026)

US Medical Clinics in Wikidata: Coverage Report February 2026

1. Mayo Clinic (`Q1130172`)

2. EXerT Clinic (`Q117353589`)

3. Maryland Dermatology Laser Skin and Vein Institute (`Q30270103`)

4. To Help Everyone Health and Wellness Centers (`Q30289048`)