From Homepage to Knowledge Graph: How We Enriched Real Showcase Businesses on Wikidata
From homepage to knowledge graph: how we enriched real showcase businesses on Wikidata
Your website hero section is where you put your best proof: real brands, real outcomes. For GEMflush, that same bar applies to the knowledge graph. We did not just talk about Wikidata publishing for business; we enriched real entities tied to our showcase—clinics, real estate agencies, and law firms across several countries—using the same discipline we recommend to agencies: research first, cite everything that matters, publish through the real Wikidata API, and treat each entity as its own quality gate.
This post walks through how that enrichment worked and why it is worth doing for generative engine optimization (GEO) and long-term AI visibility.
What “enrichment” means here
A Wikidata item is more than a name. It is a bundle of statements (properties like official website, address, industry, location) plus references (URLs and retrieval dates) that explain where each fact came from.
Enrichment means taking an item that already existed or was thin, and making it more complete and more machine-usable: correct administrative location, a full street address where appropriate, a properly formatted phone number, industry alignment, and third-party URLs that support notability and verification.
That is different from spraying random fields into an infobox. The goal is to match how Wikidata acts as premier knowledge graph infrastructure for retrieval and reasoning: typed links (cities, countries, industries) and traceable claims.
The process: one entity at a time
We worked one QID at a time, on purpose.
Rushing batch edits across unrelated businesses is how mistakes slip in: wrong city homonyms, outdated addresses, or “official” claims that are actually directory scrapes. For each showcase entity we:
- Pulled the live item from Wikidata (
wbgetentities) and catalogued what was already there—labels, descriptions, and existing statements. - Grounded contact and location facts in primary sources (official sites, imprint/contact/terms pages) wherever possible.
- Added independent references where they strengthened verification—registries, reputable directories, news, or professional listings—without replacing the official record.
- Published through the MediaWiki Action API (
wbeditentity) with clear edit summaries, using structured claim payloads built in our codebase so shapes stay consistent with Wikidata’s expected JSON model.
That rhythm—inspect, source, publish—keeps quality high and makes regressions easy to spot in revision history.
Properties we actually cared about
Different verticals need different nuance, but a common enrichment spine showed up again and again:
- Location and jurisdiction: country (P17), administrative entity such as city or district (P131), and headquarters (P159) when we could align them to the right Wikidata items (not just string-matched names).
- Contact and presence: street address in a structured monolingual field (P6375) and phone number (P1329) when policy and sources supported it.
- Industry: P452 where it helped classify the organization consistently (for example, aligning law firms with the same industry item we use across comparable entities).
- Described at URL (P973): selective third-party pages that document the business and support statements—always as an addition to, not a substitute for, the official site (P856).
If you are building a GEO practice, this overlaps directly with the hub-node idea: you want your clients attached to the right geographic and type nodes so they appear in the graph slices query engines and assistants rely on. Our hub nodes and local business AI visibility piece explains why those links matter beyond “having a QID.”
Lessons from the trenches (the unglamorous part)
A few constraints showed up repeatedly—exactly the kind of detail agencies underestimate when they first open a Wikidata account.
Phone numbers and validators. Wikidata’s community tools and filters expect internationally recognizable formats. For U.S. numbers, that often means an explicit country prefix and a consistent hyphenation pattern (for example +1-…). Getting that wrong can block a save even when the underlying fact is right.
Historic versus operating businesses. Some showcase-linked items describe historic buildings or former sites rather than a current clinic storefront. In those cases we enriched with heritage and archival references instead of inventing a modern “official website” or pretending the entity is today’s walk-in location. The graph should reflect reality, or downstream AI will confidently cite the wrong reality.
API publishing versus “almost production.” Tooling that accidentally pulls in a database connection fails in a one-off script; for live edits, a dedicated authenticated session and a claim builder that emits valid wbeditentity JSON is the reliable path. That is also how you keep edits repeatable for the next client cohort.
User-Agent and good citizenship. Read-only calls to Wikimedia endpoints should identify the bot or script responsibly. It is a small line in code and a large signal that you respect shared infrastructure.
The value proposition: why this work moves the needle
1. AI systems consume structured entity data. Assistants and RAG stacks do not “read your homepage” the way a human does on every query. They intersect text with graphs and retrieval indices. A well-formed Wikidata item is a durable, language-agnostic anchor for who you are, where you operate, and what kind of organization you are—the same primitives our Wikidata + SPARQL visibility playbook is built around.
2. References turn opinions into evidence. Every serious GEO report eventually faces the question, “Says who?” Referenced statements answer that inside the graph itself. That matters for editors, for compliance-minded clients, and for any future system that weights provenance.
3. Completeness is competitive. Coverage is uneven across local businesses; many entities are stubs. Systematic enrichment is how a serious brand—or an agency portfolio—pulls ahead of the median in the very datasets used for benchmarking and retrieval experiments (see our research-oriented posts on legal, real estate, and related coverage work).
4. Process scales when discipline is fixed. The showcase run was manual and careful by design, but the pattern scales: repeatable property bundles per vertical, shared reference rules, API-based publishing, and monitoring. That is the same operational backbone behind knowledge graph publishing for AI visibility: not one heroic edit, but a system.
Takeaways
- Enrichment is not vanity metadata; it is precision engineering on a public graph used by humans, machines, and research pipelines.
- One entity at a time with strong sourcing beats bulk edits that confuse cities, eras, or phone formats.
- Showcase work is a promise: we apply the same rigor to highlighted clients that we expect agencies to apply at scale.
If you are an agency building a GEO practice, the next step is not to memorize property IDs—it is to adopt a publish-and-measure loop: structured publishing, then proof in AI surfaces. AI visibility for SEO agencies is where we connect Wikidata discipline to multi-client monitoring; methodology documents how we tie the graph to measurable outcomes.
We will keep publishing research and field notes from real publishing runs—because in the age of generative search, the brands that win are the ones whose facts are findable, linked, and defensible.
Explore Related Topics
Learn More About GEO
Related GEO Articles
Explore our comprehensive coverage of Generative Engine Optimization:
Related Articles
Wikidata Publishing for Business | Get in the Knowledge Graph
Wikidata publishing for business: what it is, why it drives AI visibility, and how to get your business into the knowledge graph. For agencies and local businesses.
Why Linking to the Right Wikidata Nodes Matters for Local Business AI Visibility (2026)
Which Wikidata hub nodes actually drive ChatGPT and Perplexity results? We ran the numbers on 20,000+ US entities with websites—here’s what the data says about AI visibility and GEO.
Wikidata Local Business Coverage by City (2026)
How many law firms, medical clinics, and real estate companies in major US cities are in Wikidata? City-level data for AI visibility and GEO.
Wikidata Local Business Coverage: What SEO Agencies Need to Know (2026)
Data-driven look at how many US local businesses appear in Wikidata by industry. Why the gap matters for AI visibility and how agencies can add GEO services for clients.
SEO vs GEO: Stop Choosing Sides—and Add Knowledge Graph Publishing to the Stack
Why SEO remains the foundation for AI discoverability, how GEO changes metrics, and why knowledge graph publishing (e.g. Wikidata) is the durable entity layer agencies should not skip.
Why Wikidata Is a Premier Knowledge Graph for AI Visibility and GEO (2026 Catalog)
A practical catalog of Wikidata's role as premier public knowledge graph infrastructure for LLMs, SEO agencies, and generative engine optimization workflows.