CatalogScan

SEO Guide · 2026

Shopify Headless Commerce SEO: What You Lose and How to Rebuild It

Going headless — with Shopify Hydrogen, Next.js Commerce, or a custom React storefront — gives you full design control. It also silently removes four things AI shopping agents depend on: your /products.json feed, Product JSON-LD, a compliant robots.txt, and your sitemap. None of these break loudly. Here's how each breaks and the exact rebuild for each.

TL;DR Headless Shopify stores lose /products.json, auto-generated Product JSON-LD, the standard robots.txt, and the auto-generated sitemap.xml — all four are generated by Shopify's Liquid theme engine, which headless stores bypass entirely. Each requires an explicit rebuild in your headless framework to restore AI agent visibility.

What breaks when you go headless

Lost signal What it was AI agent impact Rebuild effort
/products.json Shopify's built-in product feed endpoint (Products JSON API), available at any store at yourstore.com/products.json Primary feed source for ChatGPT Shopping and Perplexity's catalog indexing. Without it, agents must crawl individual PDPs — much slower and less complete. Medium — must proxy the Storefront API or Admin API response at that path in your headless framework
Product JSON-LD The <script type="application/ld+json"> block Shopify's Liquid theme injects in every PDP's <head> Without Product JSON-LD, agents can't read GTIN, AggregateRating, ProductGroup, or Offer details from the page — they get raw HTML only, which they parse unreliably. Medium — must generate and inject JSON-LD client-side or server-side in your React/Next.js framework
robots.txt Shopify generates a standard robots.txt at /robots.txt that allows all crawlers by default If your headless framework doesn't serve a robots.txt, some crawlers fall back to assuming Disallow:all. Many headless frameworks return a 404 at /robots.txt by default. Low — add an explicit robots.txt file in your headless app's public directory
sitemap.xml Shopify generates a complete sitemap at /sitemap.xml covering all products, collections, and pages Without a sitemap, AI crawlers must discover URLs organically from your homepage — many product pages are never found. Medium-high — must generate dynamically from Storefront API and serve at /sitemap.xml

Rebuilding /products.json for a headless store

Shopify's standard /products.json endpoint returns paginated product data (up to 250 per page) from the storefront. Headless stores that serve on a custom domain no longer serve this endpoint from their own domain — requests hit the custom domain, not the Shopify storefront.

The rebuild options, in order of effort:

  1. Proxy the Shopify endpoint. In your Next.js app, add a route at /products.json that proxies requests to yourstore.myshopify.com/products.json. Lowest effort, accurate data, but requires per-request latency and Shopify API rate limit management.
  2. Generate a static feed at build time. Pull from Storefront API at build time, generate a products.json, and serve it as a static file. Fastest to serve, but data goes stale between builds — not ideal for stores with frequent inventory changes.
  3. Use a product feed app. Apps like Flexify Feed Manager or Litcommerce generate and host a continuously-updated product feed that AI agents can index without depending on your headless app.

Rebuilding Product JSON-LD in Next.js / Hydrogen

In a Next.js headless storefront, add JSON-LD to every product page by including a <script> tag in your PDP component. This must be server-rendered (not client-side injected with useEffect) so crawlers see it in the raw HTML response.

Minimum viable Product JSON-LD for AI agent visibility:

FAQ

Does Shopify Hydrogen (Remix-based) handle JSON-LD automatically?

Hydrogen provides React components for rendering structured data via the @shopify/hydrogen package, but it doesn't generate ProductGroup JSON-LD by default — you must add it. The Product component generates basic Product schema; you need to augment it with ProductGroup and hasVariant manually.

Will AI agents crawl my myshopify.com domain instead of my custom headless domain?

Only if you've set up canonical URLs pointing there. Agents typically crawl the primary domain (your custom headless domain). If your headless store doesn't serve the AI-required endpoints, agents won't find them at myshopify.com either — that URL is typically redirected to the custom domain.

My store scores low on CatalogScan. How do I know if it's the headless setup or something else?

CatalogScan distinguishes headless-specific failures from standard catalog gaps in the scan report. Signals like "products.json not found" and "Product JSON-LD missing or malformed" are typical headless indicators. Run the free scan — the top-5 findings will tell you whether you're hitting a headless-origin gap or a standard catalog hygiene issue.

Check if your headless store is AI-visible

CatalogScan scans your public storefront endpoints — no Shopify login required. See exactly which signals are missing and get a fix priority list.

Run the free scan →