Every agency client we talk to in 2026 asks the same question: "can you get me cited by ChatGPT?" The honest answer requires understanding what gets cited in the first place. We spent six weeks sampling LLM responses to commercial queries — across ChatGPT, Claude, Perplexity, and Gemini — and looking for patterns in what each model named, linked, or attributed.
Here's what we found.
The methodology, briefly
We sampled 400 queries across 8 commercial verticals (B2B SaaS, e-commerce, fintech, health, legal, real estate, education, agency services). For each query we ran the same prompt across all four models with retrieval enabled, captured the responses, and tagged every source mention by:
- Source type — owned site, Wikipedia/Wikidata, news outlet, review site, social, forum, government, other
- Position — first cited, secondary, footnote-only
- Citation form — linked, named (no link), summarized (not named)
This isn't a perfect methodology. LLM responses are non-deterministic, retrieval varies by region, and our sample skews toward English-language commercial intent. But the patterns are stable enough to act on.
Finding 1: structured sources dominate.
Across all four models, Wikipedia and Wikidata appeared in roughly 62% of cited sources at least once per response. That's higher than any other source type. Government and regulatory pages came in second (28%). Major news outlets third (23%). Brand-owned commercial pages were cited in about 19% of responses — and almost never as the primary source.
Practical takeaway: if your brand or product doesn't have a defensible Wikidata entry, you're competing with one hand tied behind your back. We help clients earn both Wikidata and (where eligible) Wikipedia presence as part of foundational AEO work. It's harder than building a backlink. It's also more durable.
Finding 2: claim density beats word count.
When a brand-owned page was cited, the pages with the highest citation rate weren't the longest pages. They were the most claim-dense pages — short paragraphs, one definitive assertion per sentence, specific numbers, and named sources.
Two examples from our sample:
- A 1,200-word page with 18 specific claims (numbers, dates, named sources) got cited in 9 of 50 sampled queries.
- A 4,200-word page on the same topic with mostly narrative content got cited in 2 of 50 queries — despite ranking #1 on Google for the head term.
The models reward extractable claims. Long-form storytelling content still wins on classic SEO, but it's at a disadvantage on the AEO surface.
Finding 3: author signals matter more than expected.
Pages with explicit author bylines, especially when the author had Person schema linked to a recognizable entity (LinkedIn, Wikidata, ORCID for academics), were cited 2.3× more often than equivalent-content pages with no byline.
This was the surprise of the study. We expected authority signals to matter, but not at this magnitude. The models clearly weight "this content is associated with a real, identifiable expert" heavily when deciding which source to name.
If you publish anonymous brand content with no named author, you're sending the model a signal that nobody at your company is willing to put their name on the claim. The model takes the hint.
Finding 4: recency matters — but less than you'd think.
For non-news commercial queries, citation rates didn't strongly favor newer content. A 2-year-old reference page with high claim density and strong entity signals beat 3-month-old marketing pages in 78% of head-to-head comparisons.
What did matter was a visible "last updated" date. Pages with an explicit last-updated stamp (in the visible content, not just `lastmod` in the sitemap) got cited more often than equivalent pages with no date at all. The model wants to know the content is still valid — but doesn't require it to be brand new.
Finding 5: the four models disagree more than you'd expect.
We saw substantial overlap in citation patterns across models, but also genuine divergence:
- ChatGPT leans heavily on Wikipedia and well-established news sources. Slightly conservative.
- Claude cites a wider range of sources, more willing to surface specialized blogs and original-research pages.
- Perplexity cites the most aggressively — almost every claim has a link. Best surface for owned-content citations because it's actively pulling from live results.
- Gemini heavily favors Google-indexed pages with strong classic SEO signals. Closest to a "search engine that talks back" model.
Strategic implication: optimizing only for one model is risky. The portfolio of LLM surfaces will keep diversifying. What earns citations across all four is the underlying claim quality + entity strength — not platform-specific tricks.
What we'd do tomorrow
If you're starting AEO from scratch and want to move citation share over the next 90 days, here's the order:
- Audit your About page + Person schema. If your founders don't have explicit Person markup linked to LinkedIn / Wikidata, fix this first. It's free and high-leverage.
- Add Wikidata entries for your brand + key people. Not Wikipedia — Wikidata first. Easier to earn, foundational.
- Pick your top 5 commercial query targets and rewrite them for claim density. Short paragraphs, specific numbers, named sources, FAQ block at the end.
- Add author bylines to every published page, linked to Person schema, with credentials visible.
- Sample LLM responses to your target queries monthly. You can't optimize what you can't observe.
Wider takeaway: AEO isn't a separate discipline. It's classic content quality + entity discipline + structured markup, applied with rigor. The brands that win are the ones who treat their own data as carefully as Wikipedia treats its citations.
The blue link will be a smaller share of the surface every year. The citation in the answer will be a bigger one. Build for both.