Home › Blog › Headless Shopify: four AI signals you lose

Headless Shopify: the four signals you silently lose when you leave the standard storefront

When you migrate to Hydrogen or build a custom Next.js storefront, Shopify quietly stops generating four things that AI shopping agents depend on. No build error. No 500. Everything looks fine in the browser. CatalogScan shows a score that is 25–40 points lower than an identical store on a standard theme — and the diff is entirely in signals the headless build never knew it had to produce.

Published 2026-05-31 · ~13 min read · By the CatalogScan team

TL;DR: Standard Shopify themes auto-generate four machine-readable outputs: (1) /products.json — the bulk product feed AI agents use for catalog indexing; (2) Product JSON-LD — the structured data block injected into every PDP <head>; (3) robots.txt — a compliant crawl policy that allows known AI bots; and (4) canonical URLs — stable, single-domain URLs in JSON-LD that tell agents which URL represents the product. Go headless and you opt out of all four. None of them fail loudly. The four verification commands and fix recipes below restore coverage without rebuilding your storefront architecture.

Signals auto-generated by Shopify themes that headless builds must produce themselves

~40pts

Typical CatalogScan score gap between a standard-theme store and an equivalent headless build with default settings

Build errors or console warnings when any of these four signals go missing

In this guide

Why standard Shopify “just works” for AI agents — and headless doesn’t
Signal 1 — /products.json: the bulk product feed
Signal 2 — Product JSON-LD: the structured data block on every PDP
Signal 3 — robots.txt: framework defaults that disallow everything
Signal 4 — canonical URL fragmentation across origin, staging, and preview
Verification: how to confirm which signals you’re missing

Why standard Shopify “just works” for AI agents — and headless doesn’t

Shopify’s standard storefront (themes like Dawn, Debut, Sense) is a server-rendered Rails application that has been generating machine-readable catalog data since 2009. When you install a theme, Shopify’s backend automatically exposes:

A JSON product feed at /products.json, /collections/{handle}.json, and /search.json
A /sitemap.xml with all products, collections, blogs, and pages
A /robots.txt with sensible defaults (allowing Googlebot, disabling internal search URLs)
JSON-LD Product structured data in the <head> of every product page, generated by the theme from Shopify Liquid objects
Canonical <link rel="canonical"> tags pointing to your primary domain

These aren’t optional theme features. They’re part of how Shopify’s backend generates pages. Your theme calls {{ product | json }} and {{ canonical_url }}; Shopify fills in the values from its internal data model.

When you go headless, you replace Shopify’s server-rendered layer with your own application. Your Hydrogen app or Next.js app fetches product data from the Shopify Storefront API (GraphQL) and renders HTML with whatever structure you choose. Shopify no longer controls the output. The Storefront API returns clean product data — but it doesn’t inject JSON-LD, generate robots.txt, or serve a product feed at /products.json. Those are rendering-layer concerns. Your rendering layer doesn’t know to produce them unless you explicitly build that output.

The silent failure pattern: A headless Shopify store looks identical to a standard store in a browser. The home page, collection pages, and product pages all render. Checkout works. Analytics fire. The only way to discover the missing signals is to check your store the way AI agents check it — with HTTP clients that fetch specific paths and read response bodies. Your developers won’t notice because no component throws an error. Your marketing team won’t notice because Google Analytics still works. CatalogScan notices because we fetch /products.json, read the <head> for JSON-LD, curl /robots.txt with a GPTBot user agent, and check canonical URL consistency.

The four signals below are the ones that appear most frequently as failures in CatalogScan scans of headless stores. They’re ordered by how many points they affect and how easy they are to fix without rebuilding your architecture.

Signal 1 — `/products.json`: the bulk product feed

Shopify product feed (/products.json) Up to 15 pts in feed-open signal ~78% headless fail rate

On a standard Shopify storefront, https://yourstore.com/products.json returns a paginated JSON array of all products with their variants, images, GTINs, metafields, and structured data fields. It’s a machine-readable bulk catalog that AI shopping agents (and CatalogScan) use to read your entire inventory without crawling individual product pages. On a headless store, this URL returns a 404, an HTML error page, or your app’s generic “not found” component.

Why AI agents depend on it

ChatGPT Shopping, Perplexity Shopping, and Google AI Mode each have crawlers that make an early request to /products.json when indexing a new store. It’s faster than crawling individual PDPs: one paginated request returns every product with its GTINs, variants, prices, and availability. If that endpoint returns a 404, the crawler falls back to sitemap-based crawling — a much slower process that also depends on your sitemap being complete (Signal 4). Stores missing the product feed get indexed last in a crawl queue that processes the feed-available stores first.

What headless breaks

In a standard Shopify storefront, the /products.json route is handled by Shopify’s own server, not your theme. When you go headless, your custom app intercepts all routes. The /products.json path has no route defined in Hydrogen or a Next.js + Storefront API setup, so the framework returns a 404. Your CDN may then cache that 404, making recovery even slower after you add the route.

Fix: add a `/products.json` route that proxies the Storefront API

Hydrogen In Hydrogen, add a file at app/routes/products.json.tsx:

// app/routes/products.json.tsx
import {json, type LoaderFunctionArgs} from '@shopify/remix-oxygen';

export async function loader({request, context}: LoaderFunctionArgs) {
  const url = new URL(request.url);
  const page = Number(url.searchParams.get('page') ?? 1);
  const limit = Math.min(Number(url.searchParams.get('limit') ?? 250), 250);
  const cursor = url.searchParams.get('cursor') ?? null;

  const {products} = await context.storefront.query(PRODUCTS_QUERY, {
    variables: {first: limit, after: cursor},
  });

  // Mirror Shopify's standard /products.json shape so AI crawlers
  // recognise the response format
  const shaped = products.nodes.map((p: any) => ({
    id: p.id,
    title: p.title,
    handle: p.handle,
    body_html: p.descriptionHtml,
    vendor: p.vendor,
    product_type: p.productType,
    created_at: p.createdAt,
    tags: p.tags,
    variants: p.variants.nodes.map((v: any) => ({
      id: v.id,
      title: v.title,
      price: v.price.amount,
      sku: v.sku,
      barcode: v.barcode,   // GTIN lives here in standard Shopify JSON
      available: v.availableForSale,
    })),
    images: p.images.nodes.map((img: any) => ({src: img.url, alt: img.altText})),
  }));

  return json(
    {products: shaped},
    {headers: {'Cache-Control': 'public, max-age=3600, stale-while-revalidate=86400'}},
  );
}

const PRODUCTS_QUERY = `#graphql
  query ProductFeed($first: Int!, $after: String) {
    products(first: $first, after: $after) {
      nodes {
        id title handle descriptionHtml vendor productType createdAt tags
        variants(first: 100) {
          nodes { id title sku barcode availableForSale price { amount } }
        }
        images(first: 5) { nodes { url altText } }
      }
    }
  }
`;

Next.js In a Next.js App Router setup, add app/products.json/route.ts:

// app/products.json/route.ts
import {NextRequest, NextResponse} from 'next/server';

export async function GET(req: NextRequest) {
  const {searchParams} = new URL(req.url);
  const limit = Math.min(Number(searchParams.get('limit') ?? 250), 250);

  const res = await fetch(
    `https://${process.env.SHOPIFY_STORE_DOMAIN}/api/2024-10/graphql.json`,
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-Shopify-Storefront-Access-Token': process.env.SHOPIFY_STOREFRONT_TOKEN!,
      },
      body: JSON.stringify({query: PRODUCTS_QUERY, variables: {first: limit}}),
      next: {revalidate: 3600},
    },
  );

  const {data} = await res.json();
  const products = data.products.nodes.map((p: any) => ({
    id: p.id, title: p.title, handle: p.handle,
    body_html: p.descriptionHtml, vendor: p.vendor,
    product_type: p.productType, tags: p.tags,
    variants: p.variants.nodes.map((v: any) => ({
      id: v.id, title: v.title, sku: v.sku,
      barcode: v.barcode, price: v.price.amount,
      available: v.availableForSale,
    })),
    images: p.images.nodes.map((img: any) => ({src: img.url, alt: img.altText})),
  }));

  return NextResponse.json({products}, {
    headers: {'Cache-Control': 'public, s-maxage=3600, stale-while-revalidate=86400'},
  });
}

const PRODUCTS_QUERY = `
  query ProductFeed($first: Int!) {
    products(first: $first) {
      nodes {
        id title handle descriptionHtml vendor productType tags
        variants(first: 100) {
          nodes { id title sku barcode availableForSale price { amount } }
        }
        images(first: 5) { nodes { url altText } }
      }
    }
  }
`;

Verify with: curl https://yourstore.com/products.json | python3 -m json.tool | head -30. You should see a {"products": [...]} response. CatalogScan’s shopify-feed signal checks this path and the barcode/GTIN field population rate.

Common mistake: exposing the Storefront API directly instead of mirroring the shape

Some teams proxy their GraphQL endpoint to /products.json and return the Storefront API’s own GraphQL shape (with edges and nodes wrappers). AI crawlers that parse the standard Shopify product feed shape — top-level products array with flat variants, barcode, and price fields — won’t extract data correctly from a GraphQL-shaped response. Mirror the standard REST shape, even though your internal data comes from GraphQL.

Signal 2 — Product JSON-LD: the structured data block on every PDP

Product JSON-LD (@type: "Product" in <head>) Up to 30 pts across ProductGroup + GTIN + schema signals ~71% headless fail rate

On a standard Shopify storefront, every product page <head> contains a <script type="application/ld+json"> block with a Product schema object. Shopify’s Liquid engine generates it from the product model. On a headless store, no framework generates this automatically — you have to write the component yourself. Most headless builds ship without any Product JSON-LD at all, or ship with an incomplete schema that omits GTINs, AggregateRating, and ProductGroup (the three highest-value signals for AI shopping agents).

What the correct schema shape looks like

AI shopping agents need four specific fields in your Product JSON-LD that most developers don’t include by default. Here’s the minimum correct schema for a product with variants:

{
  "@context": "https://schema.org",
  "@type": "ProductGroup",
  "name": "Running Shoe Model X",
  "url": "https://yourstore.com/products/running-shoe-model-x",
  "brand": {
    "@type": "Brand",
    "name": "YourBrand"
  },
  "description": "Product description text...",
  "image": "https://cdn.shopify.com/...",
  "hasVariant": [
    {
      "@type": "Product",
      "name": "Running Shoe Model X — Size 10 / Black",
      "sku": "SKU-001",
      "gtin13": "0012345678901",
      "offers": {
        "@type": "Offer",
        "price": "89.99",
        "priceCurrency": "USD",
        "availability": "https://schema.org/InStock",
        "url": "https://yourstore.com/products/running-shoe-model-x?variant=123456789"
      },
      "aggregateRating": {
        "@type": "AggregateRating",
        "ratingValue": "4.7",
        "reviewCount": "312"
      }
    }
  ]
}

The four fields that most headless builds omit:

@type: "ProductGroup" at the root, with hasVariant listing each variant as a nested Product. Standard Shopify themes generate a flat Product with offers (acceptable) but AI agents prefer ProductGroup + hasVariant because it lets them match variant-specific queries (“size 10 in black”) without a separate page crawl. See our post on ProductGroup JSON-LD on Shopify for why 60% of stores leave 18 points on the table here.
gtin13 (or gtin12 / gtin8) on each variant. This is the barcode field from the Storefront API. AI shopping agents use GTINs to de-duplicate products across merchants and to verify product authenticity. A product without a GTIN is treated as anonymous in the shopping index. See Shopify GTIN requirements for AI shopping agents.
aggregateRating. Most headless review integrations inject star ratings as client-side UI without adding them to JSON-LD. The AggregateRating in your structured data must come from a server-rendered component; a review widget that injects stars via JavaScript after page load is invisible to crawlers. See AggregateRating on Shopify: per-app fix recipes.
Consistent url pointing to your primary domain. More on this in Signal 4 below.

Fix: add a server-rendered JSON-LD component

Hydrogen In Hydrogen, use Remix’s Script component in your root.tsx or product route:

// app/routes/products.$handle.tsx
import {Script} from '@shopify/hydrogen';

export default function ProductPage({product, reviews}: any) {
  const jsonLd = {
    "@context": "https://schema.org",
    "@type": "ProductGroup",
    "name": product.title,
    "url": `https://${process.env.PUBLIC_STORE_DOMAIN}/products/${product.handle}`,
    "brand": {"@type": "Brand", "name": product.vendor},
    "description": product.description,
    "image": product.featuredImage?.url,
    "hasVariant": product.variants.nodes.map((v: any) => ({
      "@type": "Product",
      "name": `${product.title} — ${v.title}`,
      "sku": v.sku,
      ...(v.barcode ? {gtin13: v.barcode} : {}),
      "offers": {
        "@type": "Offer",
        "price": v.price.amount,
        "priceCurrency": v.price.currencyCode,
        "availability": v.availableForSale
          ? "https://schema.org/InStock"
          : "https://schema.org/OutOfStock",
        "url": `https://${process.env.PUBLIC_STORE_DOMAIN}/products/${product.handle}?variant=${v.id.split('/').pop()}`,
      },
      ...(reviews ? {
        "aggregateRating": {
          "@type": "AggregateRating",
          "ratingValue": reviews.average.toFixed(1),
          "reviewCount": String(reviews.count),
        }
      } : {}),
    })),
  };

  return (
    <>
      <Script type="application/ld+json">
        {JSON.stringify(jsonLd)}
      </Script>
      {/* ... rest of your PDP */}
    </>
  );
}

Next.js In Next.js App Router, inject via <Head> in your layout or page:

// app/products/[handle]/page.tsx
export default async function ProductPage({params}: any) {
  const product = await fetchProduct(params.handle);
  const reviews = await fetchReviews(params.handle);

  const jsonLd = buildProductGroupJsonLd(product, reviews);

  return (
    <>
      <script
        type="application/ld+json"
        dangerouslySetInnerHTML={{__html: JSON.stringify(jsonLd)}}
      />
      {/* ... rest of your PDP */}
    </>
  );
}

The key constraint: the JSON-LD must be in the server-rendered HTML response, not injected by client-side JavaScript after hydration. Crawlers read the raw HTTP response; they don’t execute JavaScript. If your JSON-LD only appears after a useEffect or client-side data fetch, it is invisible to every AI shopping agent.

Common mistake: including `gtin13: null` or `gtin13: ""`

Some builds include the barcode field unconditionally, resulting in "gtin13": null or "gtin13": "" in the JSON-LD for variants without barcodes. This is worse than omitting the field: schema validators flag it as malformed, and some AI indexers treat an explicit null GTIN as evidence the product is deliberately anonymous, lowering its trustworthiness score. Use a conditional spread (...(v.barcode ? {gtin13: v.barcode} : {})) to omit the field entirely when the barcode is absent.

Signal 3 — `robots.txt`: framework defaults that disallow everything

robots.txt crawl policy Up to 15 pts (robots-open signal — the floor signal) ~63% headless fail rate

Standard Shopify generates /robots.txt automatically. The default allows Googlebot and all other crawlers except internal admin paths. Headless builds have no such default. Hydrogen uses Remix, which doesn’t generate a robots.txt by default. Next.js 14+ has a robots.ts metadata convention but it’s not populated automatically. Many headless Shopify builds go live with no robots.txt at all (HTTP 404) or with a framework default that blocks all crawlers.

The three robots.txt failure modes in headless builds

Mode A: 404 (no robots.txt at all). CatalogScan fetches /robots.txt and gets a 404. Per the Google robots.txt spec, a 404 response means “no restrictions,” so the crawler proceeds. But AI shopping agents that follow this rule still note the missing file as a quality signal — a store with no robots.txt is one that’s less likely to have its catalog signals correctly configured.

Mode B: blocking default from the hosting platform. Vercel, Netlify, and similar platforms sometimes inject a default robots.txt for preview deployments (User-agent: *\nDisallow: /) to prevent preview URLs from being indexed. If this default is set at the platform level and your production deployment doesn’t override it, your production store has a robots.txt that tells every crawler to stay out. This is the most damaging failure mode — CatalogScan sees the robots-open signal at 0/15.

Mode C: outdated allow-list that misses new AI crawlers. Some teams manually wrote a robots.txt that allows Googlebot and a handful of other bots from 2021–2022. GPTBot (launched 2023), OAI-SearchBot (2024), PerplexityBot (2023), ClaudeBot (2023), and Google-Extended (2023) aren’t in the list. Depending on how the file is written, these crawlers may be covered by a generic User-agent: * allow rule, or they may be blocked by a Disallow: / for all non-listed agents.

Fix: write an explicit robots.txt that names AI shopping crawlers

Hydrogen In Hydrogen (Remix), add app/routes/robots[.]txt.tsx:

// app/routes/robots[.]txt.tsx
import {type LoaderFunctionArgs} from '@shopify/remix-oxygen';

export function loader({request}: LoaderFunctionArgs) {
  const host = new URL(request.url).host;
  const body = `User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts
Disallow: /account

Sitemap: https://${host}/sitemap.xml
`.trim();

  return new Response(body, {
    headers: {
      'Content-Type': 'text/plain',
      'Cache-Control': 'public, max-age=3600',
    },
  });
}

Next.js In Next.js App Router, use the robots.ts metadata API in app/robots.ts:

// app/robots.ts
import {MetadataRoute} from 'next';

export default function robots(): MetadataRoute.Robots {
  const base = `https://${process.env.NEXT_PUBLIC_STORE_DOMAIN}`;
  return {
    rules: [
      {userAgent: '*', allow: '/', disallow: ['/admin', '/cart', '/checkouts', '/orders', '/account']},
      {userAgent: 'GPTBot', allow: '/'},
      {userAgent: 'OAI-SearchBot', allow: '/'},
      {userAgent: 'PerplexityBot', allow: '/'},
      {userAgent: 'ClaudeBot', allow: '/'},
      {userAgent: 'Google-Extended', allow: '/'},
      {userAgent: 'Applebot-Extended', allow: '/'},
    ],
    sitemap: `${base}/sitemap.xml`,
  };
}

Verify with: curl -si -A "GPTBot/1.0" https://yourstore.com/robots.txt | head -10. You should see HTTP/2 200 and the User-agent: * or User-agent: GPTBot rule with Allow: /. If you’re behind Cloudflare, also verify without a Cloudflare block — see our guide on Cloudflare settings that silently block AI shopping agents for the three settings to check.

Common mistake: leaving preview environment robots.txt in production

Vercel’s preview deployments inject X-Robots-Tag: noindex headers by default. On a standard Vercel setup, this only applies to preview URLs (*.vercel.app), not your custom domain. But if your production domain is still configured through Vercel’s platform-level settings rather than your app’s custom domain, you may be serving noindex headers on production. Check your Caddy/nginx/Vercel headers for X-Robots-Tag in addition to the robots.txt content.

Signal 4 — canonical URL fragmentation across origin, staging, and preview

Canonical URL consistency Affects deduplication in all shopping indexes ~55% headless fail rate

Standard Shopify stores have exactly one URL namespace: your primary custom domain. Every page has a single canonical URL injected by the theme. Headless builds routinely have three or more URL surfaces for the same product: the custom domain (yourstore.com), the Shopify origin (yourstore.myshopify.com), the Vercel/Oxygen deployment URL (your-project.vercel.app or your-project.oxygen.myshopify.com), and any staging or preview branches. If any of these surfaces is crawlable and returns different or missing canonical tags, AI shopping agents split their product graph across multiple URLs for the same product, reducing the effective citation count and authority for your primary domain.

The three ways canonical fragmentation happens

1. JSON-LD url and offers.url using an environment variable that points to the wrong host. A common Hydrogen/Next.js pattern is to read the store domain from process.env.PUBLIC_STORE_DOMAIN or NEXT_PUBLIC_STORE_DOMAIN and use it to construct canonical URLs. If that environment variable is not set correctly in production, product URLs in JSON-LD point to localhost:3000, a staging host, or the .myshopify.com domain. AI shopping agents index the url field in JSON-LD as the authoritative product URL.

2. Shopify’s own .myshopify.com storefront remaining publicly accessible. Standard Shopify stores are accessible at both yourstore.com and yourstore.myshopify.com. On a standard storefront, Shopify automatically sets <link rel="canonical"> to the primary domain on both. In a headless build, the .myshopify.com origin may still serve the old Shopify storefront with canonical URLs pointing to .myshopify.com — creating a second, crawlable version of every product page that AI agents treat as a separate entity.

3. Preview deployments with open robots.txt. Vercel preview deployments are publicly accessible by default. If a preview deployment renders real product data (not mocked) and doesn’t block crawlers, AI shopping agents index preview URLs as real product pages. A store with 500 products that has been actively deployed for 6 months may have dozens of indexed preview URLs for each product, all with different canonical tags, split across multiple .vercel.app subdomains.

Fix: enforce canonical URLs at the source

Three changes that prevent canonical fragmentation:

Step 1: Hard-code the primary domain in JSON-LD. Don’t construct product URLs from an environment variable that might change across environments. Set a CANONICAL_DOMAIN environment variable explicitly in your production deployment and use it only for JSON-LD and canonical tags:

# In your production environment (Vercel, Oxygen, etc.)
CANONICAL_DOMAIN=https://yourstore.com

# In your app code, for JSON-LD only:
const canonicalBase = process.env.CANONICAL_DOMAIN;
// Use this for JSON-LD url fields, NOT for internal links or API calls

Step 2: Block the .myshopify.com storefront from being indexed. Log into your Shopify Admin → Online Store → Preferences and look for “Password protection.” Enable password protection on the .myshopify.com storefront. This returns HTTP 401 for all non-authenticated crawlers without affecting your headless storefront (which doesn’t route through the Shopify storefront layer). Alternatively, add a robots.txt to your Shopify theme (the one still serving .myshopify.com) with Disallow: / under all user-agent rules.

Step 3: Block preview deployments at the platform level. On Vercel, set VERCEL_ENV-conditional robots.txt rules: if VERCEL_ENV !== 'production', return Disallow: / for all crawlers. On Oxygen, use Shopify’s deployment environment variables to do the same:

// app/robots.ts (Next.js) or app/routes/robots[.]txt.tsx (Hydrogen)
const isProduction = process.env.VERCEL_ENV === 'production'
  || process.env.SHOPIFY_APP_ENV === 'production';

if (!isProduction) {
  // All non-production environments block all crawlers
  return {rules: [{userAgent: '*', disallow: '/'}]};
}
// ... production robots.txt as above

How to audit your current canonical state

# Check JSON-LD url field on a product page
curl -s https://yourstore.com/products/your-product-handle \
  | python3 -c "
import sys, re, json
html = sys.stdin.read()
for m in re.findall(r'<script type=\"application/ld\+json\">(.*?)</script>', html, re.S):
  try:
    d = json.loads(m)
    if d.get('@type') in ('Product', 'ProductGroup'):
      print('url:', d.get('url', 'MISSING'))
      if d.get('hasVariant'):
        print('first variant url:', d['hasVariant'][0].get('offers', {}).get('url', 'MISSING'))
  except: pass
"

The url field must be https://yourstore.com/products/.... If it shows localhost, a .myshopify.com domain, a .vercel.app URL, or is MISSING, you have a canonical fragmentation problem.

Verification: how to confirm which signals you’re missing

Run these four checks in order. Each takes under two minutes. All four need to pass before a headless Shopify store reaches the same AI-shopping baseline as a standard-theme store.

Signal	Command / Check	Pass condition
Product feed	`curl -s https://yourstore.com/products.json \| python3 -c "import json,sys; d=json.load(sys.stdin); print(len(d['products']), 'products')"`	Prints a count > 0. No 404, no HTML error page.
Product JSON-LD	`curl -s https://yourstore.com/products/YOUR_HANDLE \| grep -o '"@type":"Product[^"]*"'`	Returns `"@type":"Product"` or `"@type":"ProductGroup"`. Not empty.
robots.txt (GPTBot)	`curl -si -A "GPTBot/1.0 (+https://openai.com/gptbot)" https://yourstore.com/robots.txt \| head -5`	First line is `HTTP/2 200` (or `HTTP/1.1 200`). Body starts with `User-agent:`, not `<!DOCTYPE`.
Canonical URL	`curl -s https://yourstore.com/products/YOUR_HANDLE \| python3 -c "import sys,re; [print(m) for m in re.findall(r'\"url\"\s:\s\"([^\"]+)\"', sys.stdin.read())]" \| head -5`	All `url` values start with `https://yourstore.com`. No `localhost`, `.myshopify.com`, or `.vercel.app` URLs.

If all four pass at the command line, run a free CatalogScan scan on your store. CatalogScan checks additional sub-fields within each signal — GTIN coverage per variant, AggregateRating presence, sitemap completeness, and 11 other signals — that the curl tests above don’t catch. The full 18-signal report shows the specific fields missing and the point impact of each fix.

Typical headless baseline (before fixes)

Product feed: 404
JSON-LD: missing or flat Product without variants
robots.txt: 404 or Disallow: /
Canonical: localhost or .vercel.app URLs

Score: ~30–45 / 100

After implementing all four fixes

Product feed: 200, correct shape, GTIN in barcode field
JSON-LD: ProductGroup + hasVariant + gtin13 + aggregateRating
robots.txt: 200, named AI crawlers allowed
Canonical: all urls point to primary domain

Score: ~70–85 / 100

The remaining 15–30 points after the four fixes typically come from content signals: metafield coverage (shopify.color-pattern, shopify.subtitle, mm-google-shopping.age_group), product description quality, and image alt text — all of which are addressable through the Shopify Admin API without touching your storefront code.

Is your headless store invisible to AI shopping agents?

Free 2-minute scan. We fetch /products.json, read the Product JSON-LD in your PDP <head>, curl your /robots.txt with each major AI-agent UA, and check canonical URL consistency — plus 14 other signals. Headless stores often score 30–40 points below their standard-theme equivalents. The report shows exactly which signals are missing and what fixing each one is worth.

Scan my store → Full 18-signal checklist