I scanned 100 top DTC stores for AI-shopping readiness

Shopify turned on Agentic Storefronts for every eligible US merchant on March 24, 2026. Adobe reports AI-referred retail traffic grew 693% YoY and converts 31% better than non-AI traffic. I wanted to know: of the stores people actually talk about, who's actually set up to show up in ChatGPT Shopping, Perplexity, and Google AI Mode?

Scan run 2026-04-22 · 5 public-data signals · 100 points max · download raw CSV →

100stores scanned

40no public product feed (invisible)

60still expose a public feed

39scored 100/100 on the basics

10on Shopify but leaking discoverability

The headline: 40 of the top 100 DTC brands don't expose a public /products.json feed at all — many of them haven't left Shopify, they've moved to a headless front-end (Hydrogen on Vercel, Next.js + Storefront API, S3/CloudFront with a custom router) that hides the endpoint AI shoppers read. The back-end is still Shopify; the AI-readable surface is gone.

What I scored 5 signals · 100 pts

These are the floor — the signals AI shoppers literally cannot work around. A store missing any of them is leaking discoverability on every agent request.

Public /products.json feed — 25 pts. Shopify's open product feed. The primary endpoint AI agents ingest for price, title, images, variants. Missing or blocked → you're invisible to anyone scraping at scale.
Product schema.org JSON-LD on PDPs — 30 pts. The single biggest discovery signal. ChatGPT Shopping, Perplexity, Google AI Mode all parse <script type="application/ld+json">{"@type":"Product"…} blocks for price, availability, brand, GTIN, reviews. Without it, agents have to reverse-engineer your HTML.
Valid sitemap.xml — 15 pts. Tells crawlers what to read. A missing or malformed sitemap means AI bots only know about pages they stumble onto via links.
Open Graph on homepage — 15 pts. og:title + og:description + og:image. What AI assistants render when they surface your store as a card in a response.
Open robots.txt — 15 pts. No Disallow: / or Disallow: /products for User-agent: *. One wrong line here makes every other signal moot.

The headline numbers

Of the 60 brands that still expose a public product feed:

Bucket	Count	% of public-feed group
Perfect 100/100	39	65%
Score 80–99	11	18%
Score 60–79	7	12%
Score below 60	3	5%

Average 92.8/100 · median 100/100 · 83% scored 80 or above.

That's the good news: the brands that haven't gone headless mostly have the basic infrastructure in place. Shopify's default output gives most of these signals for free.

The bad news comes in the next section.

The 10 that are leaking Shopify · score < 80

These brands still expose /products.json — they're on Shopify, agents can find their catalog — but they're missing one or more of the other four floor signals. Most common miss: Product JSON-LD on PDPs.

Store	Score	Missing	Scorecard
birddogs.com	55	JSON-LD, Open Graph	view
mackweldon.com	55	JSON-LD, Open Graph	view
roka.com	55	JSON-LD, Open Graph	view
gymshark.com	60	JSON-LD, partial OG	view
tentree.com	60	JSON-LD, partial OG	view
banditrunning.com	70	JSON-LD	view
bokksu.com	70	JSON-LD	view
bollandbranch.com	70	JSON-LD	view
liquiddeath.com	70	JSON-LD	view
oliveandjune.com	70	JSON-LD	view

Scores re-verified 2026-04-22 (all 10 unchanged since the initial scan).

The headless trap

Here's the finding that surprised me. I expected the "40 invisible" bucket to be mostly non-Shopify stores — Big Cartel, Squarespace, custom builds. It wasn't.

It's brands like Bombas (Vercel + Next.js against Shopify Storefront API), Kendra Scott (AWS S3 + CloudFront with a custom router), Goop (custom), Glossier (headless on a custom stack). The back-end is still Shopify. The PDPs still render. The checkout still works. But /products.json returns 404 or a static HTML page, and with it goes the primary way AI shoppers ingest catalog data at scale.

Going headless is fine. Going headless without re-implementing the canonical catalog endpoints + JSON-LD on the new front is an own-goal that cuts you out of a channel Adobe says is growing 693% YoY.

Whose headless build actually is AI-readable? I'd love to know. Send nominations and I'll re-scan them.

Why this is only the floor what the 5-signal scan misses

The 5 signals here are the ones every store has to get right. But a store that scores 100/100 on the floor can still be invisible to ChatGPT when ranking against a direct competitor. The actual ranking spread comes from these 13 deeper signals — which is what the full CatalogScan checks:

GTIN / barcode coverage across variants
Google Product Category metafield depth
Product type taxonomy (native vs freeform)
Image alt-text coverage (for text-first agent responses)
Description length and boilerplate detection
Review schema (AggregateRating JSON-LD)
Availability and shipping schema
Brand JSON-LD on PDP
Canonical URL hygiene
Hreflang for multi-region stores
Structured data validation errors (what Google's Rich Results test catches)
Crawl-budget waste (parameterized URLs not canonicalized)
Mobile JSON-LD parity (no fields missing vs desktop)

A store at 100/100 on the 5 floor signals can still fail half of these — which is what actually decides whether an agent surfaces you or a competitor.

AI agents are already here live · from our own logs

CatalogScan has been online for a handful of days. In that time, these are the AI-shopping and search crawlers that fetched pages from catalogscan.com — classified by User-Agent from the Caddy access logs, one bucket per bot family.

AI / search crawler	Visits	Last seen
GPTBot	220	2026-04-29
YandexBot	134	2026-06-01
Applebot	80	2026-06-02
ClaudeBot	62	2026-05-01
SemrushBot	38	2026-06-01
OAI-SearchBot	26	2026-06-01
AhrefsBot	24	2026-04-29
Googlebot	20	2026-05-30
GoogleOther	6	2026-05-01
facebookexternalhit	2	2026-04-29

If ClaudeBot, GPTBot, and Applebot are crawling a 4-day-old catalog-observability site, they're crawling yours too. The question is whether they can read what they find — which is exactly what the scan on the home page tells you.

Caveats

5 product handles per store. I test JSON-LD on the first product returned by /products.json?limit=5. A store could ship JSON-LD on some PDPs and not others; this misses that. The full scan checks every PDP.

Single homepage Open Graph snapshot. A store could have OG tags on the home but not on PDPs or collections. Full scan walks the tree.

Bot UA could be stripped. I send CatalogScanBot/0.1. A store with aggressive bot-detection middleware could strip structured data when it sees the UA and score falsely low. Spot checks of the bottom 10 + top brands (Allbirds, Rothy's) showed full payloads served, so I don't think any of the results above are affected.

"No public feed" isn't always a downgrade. Some of the 40 off-feed brands are on Hydrogen / Storefront API stacks that may expose equivalent endpoints under different paths. The scan doesn't know about those. Future scan pass should run platform-agnostic checks for this group.

Run the same scan on your store free · no login

Paste your store URL on the home page. Same 5 floor signals, plus the 13 deeper ones, 0–100 readiness score, top-5 fixes ranked by impact.

Is your store invisible to ChatGPT?

2-minute scan. No Shopify permissions needed.

Scan my store → See the full leaderboard