CatalogScan

Structured Data

Shopify Structured Data Testing: How to Validate JSON-LD for AI Agents

AI shopping agents read your product pages differently than a browser does. Testing structured data only in a browser is the most common reason a merchant believes their schema is correct while ChatGPT and Perplexity see missing or malformed data. Here are the four tools that show you what AI agents actually see.

TL;DR Use four tools in combination: (1) Google Rich Results Test for human-readable error reports, (2) Schema.org validator for spec compliance, (3) curl with a bot user-agent to check what AI crawlers receive before JavaScript executes, (4) CatalogScan for an automated 18-signal audit of your whole catalog. Most Shopify failures are missing aggregateRating, empty gtin, and truncated description.

Why browser testing misses AI agent failures

Shopify themes typically render Product JSON-LD server-side (it appears in the initial HTML response, before any JavaScript runs). But some themes and apps inject structured data via JavaScript — review scores, variant-specific data, or app-added schema blocks. AI crawlers like GPTBot and ClaudeBot often do not execute JavaScript, or execute it with lower priority than Googlebot does.

If your JSON-LD depends on JavaScript to render, your browser shows a valid structured data block but the AI agent receives an empty page or partial data. Testing with curl — which fetches the raw HTML without JavaScript — reveals the gap.

A second common gap: Cloudflare Bot Fight Mode or WAF rules challenge bot user-agents. Your browser (with a real IP and browser fingerprint) passes through. GPTBot gets served a JavaScript challenge page. The challenge page has no JSON-LD. Result: AI agent sees no structured data even though your theme has it correctly implemented.

4 testing tools and when to use each

1 Google Rich Results Test Single page · human-readable · official Google format

Go to search.google.com/test/rich-results, enter a product page URL, and run the test. Google renders the page (including JavaScript) and parses all structured data blocks. The output shows:

  • Which rich result types were detected (Product, FAQPage, BreadcrumbList)
  • Warnings for recommended-but-missing fields (like aggregateRating and brand)
  • Errors for required fields that Google considers necessary for rich results eligibility

Best for: initial audit of a single product page, confirming a fix worked, sharing a screenshot with a developer. Limitation: uses Googlebot rendering (full JavaScript) — does not reveal what GPTBot/ClaudeBot see on a no-JS page load.

2 Schema.org Validator Single page · spec compliance · catches non-Google errors

Go to validator.schema.org and enter your product page URL. This validator checks compliance with the Schema.org specification directly — independent of Google's implementation. It catches issues that Google's tool tolerates but that other consumers (AI agents, semantic web tools) may reject.

Common Schema.org errors that Google tolerates but AI agents act on: using offers.price as a string instead of a number, missing offers.priceCurrency, and using a bare string for brand instead of an Organization or Brand entity.

Best for: confirming spec compliance beyond Google's requirements. Run this after Rich Results Test passes — it catches the next layer of issues.

3 curl with AI crawler user-agent Bot view · reveals Cloudflare blocks · tests raw HTML response

Fetch a product page as GPTBot to see the raw HTML an AI crawler receives — no JavaScript execution, no browser fingerprint, just the server's response to the actual user-agent string OpenAI uses:

curl -sA "GPTBot/1.0" "https://yourdomain.com/products/your-product" | grep -A 50 'application/ld+json'

A clean result shows your <script type="application/ld+json"> block inline in the HTML. A Cloudflare challenge looks like:

<!DOCTYPE html>
<!--[if lt IE 7]><html class="no-js ie6 oldie" ...>
Just a moment... Cloudflare
<!-- Checking your browser before accessing -->

If you see the challenge page, GPTBot cannot read your structured data. Fix: In Cloudflare dashboard → Security → Bots, disable "Bot Fight Mode" or add GPTBot, OAI-SearchBot, ClaudeBot, and PerplexityBot to your allowlist. See the Cloudflare AI crawlers guide for the exact settings.

Also test with ClaudeBot and PerplexityBot user-agents — some WAF rules are user-agent-specific:

curl -sA "ClaudeBot/1.0" "https://yourdomain.com/products/your-product" | grep 'ld+json' | head -5
curl -sA "PerplexityBot/1.0" "https://yourdomain.com/products/your-product" | grep 'ld+json' | head -5
4 CatalogScan automated scan Whole catalog · 18 signals · prioritized fix list

CatalogScan scans your entire store and scores 18 structured data and accessibility signals in under 2 minutes. Unlike the single-page tools above, it crawls multiple product pages, checks the bulk product feed at /products.json, reads your robots.txt, and aggregates signal coverage across the catalog.

The scan uses real bot user-agents, so it detects Cloudflare blocks, missing JSON-LD, partial GTIN coverage, empty aggregateRating, and truncated descriptions — the same issues an AI agent would hit. Results include a 0–100 score and a prioritized list of the top 5 fixes that will move your score the most.

Most common Shopify JSON-LD errors and how to fix them

"aggregateRating" is recommended but missing
Cause: Shopify themes include Product JSON-LD but most don't connect review app data to the structured data output. Fix: In your review app settings (Judge.me, Yotpo, Okendo, Stamped.io), look for a "Rich snippets," "Schema.org," or "Structured data" toggle and enable it. Some apps require adding a Liquid snippet to your theme — check the app's documentation for "enable rich snippets."
"gtin" / "gtin13" field is empty or missing
Cause: Shopify's default Product JSON-LD reads from the variant's barcode field. If barcode is empty, the gtin field is either omitted or emits an empty string. Fix: Populate barcode fields for all variants. For bulk update instructions, see the GTIN guide.
"description" field is truncated or empty
Cause: Default Shopify themes often use product.description | strip_html | truncate: 160 in their JSON-LD Liquid template. This cuts descriptions to 160 characters — too short for AI citation. Fix: In your theme's product JSON-LD block (usually in templates/product.json or sections/product-template.liquid), change the Liquid filter to product.description | strip_html without truncation.
"brand" is a plain string, not an entity
Cause: Some themes output "brand": "Acme" as a bare string. The Schema.org spec requires "brand": { "@type": "Brand", "name": "Acme" }. Fix: Update the theme's JSON-LD Liquid to wrap the brand in an object: "brand": {"@type": "Brand", "name": {{ product.vendor | json }}}
Multiple JSON-LD blocks with conflicting "@type": "Product"
Cause: A review app or SEO app adds its own Product JSON-LD block in addition to the theme's. Two Product blocks on the same page can cause parsers to use the weaker one. Fix: Identify which app is adding the duplicate (view source and search for application/ld+json — you'll see two blocks). Disable the duplicate in the app settings, or modify the theme to output a single merged block.

Testing checklist

Use this sequence when auditing a Shopify store's structured data:

  1. Run Rich Results Test on your best-selling product page. Note any errors and warnings.
  2. Run Schema.org Validator on the same URL. Compare findings — Schema.org often catches brand entity format and offer field type issues that Rich Results Test misses.
  3. Run curl with GPTBot user-agent on the same URL. Confirm the JSON-LD block is in the raw HTML response (not JavaScript-rendered). Confirm no Cloudflare challenge page.
  4. Run CatalogScan to get catalog-wide coverage data — GTIN coverage %, aggregateRating coverage %, and which specific products are missing key fields.
  5. After fixing: Re-run all four. Rich Results Test and CatalogScan confirm fixes immediately. Google Search Console rich results report updates within 1–2 weeks of re-crawl.

Common questions

My JSON-LD validates fine in Rich Results Test. Why is CatalogScan showing errors?

Rich Results Test validates one page as Googlebot (with JavaScript). CatalogScan tests as a bot user-agent without JavaScript, covering a different failure mode. It's common to pass Rich Results Test and fail CatalogScan if your JSON-LD is JavaScript-rendered, or if your product feed (/products.json) has missing GTIN fields that are separate from the page-level JSON-LD. The two tools catch different issues — running both is more complete than either alone.

How often should I re-test structured data?

After any Shopify theme update, app install/uninstall, or Cloudflare configuration change — these are the most common causes of structured data regressions. Monthly re-runs catch drift in GTIN coverage (new products added without barcodes) and review app configuration changes. CatalogScan scans are free and take under 2 minutes.

My product pages have no JSON-LD at all — how is this possible?

Three common causes: (1) You switched to a headless front-end (Hydrogen, Next.js, custom React) and the new front-end doesn't include a structured data layer — the JSON-LD was in your old Shopify Liquid theme and didn't migrate. (2) You installed an app that inadvertently removed the theme's JSON-LD block from the layout. (3) You are using a minimal or custom theme that never included Product JSON-LD. Fix: add a <script type="application/ld+json"> block to your product template with at minimum name, description, image, offers, and brand.

Does Shopify's built-in JSON-LD pass validation out of the box?

The default Shopify JSON-LD (generated by most themes) passes Rich Results Test at a basic level but typically triggers warnings for missing aggregateRating, missing or empty gtin, and truncated description. None of these are hard errors in Google's validator (the rich result still appears), but they are meaningful quality signals for AI agents that go beyond schema validity. A passing Rich Results Test with 0 errors but 3 warnings is better than nothing — but the warnings represent real data gaps that reduce AI recommendation confidence.

Is ProductGroup JSON-LD separate from Product JSON-LD?

Yes. Product JSON-LD describes a single variant — it's what most Shopify themes generate. ProductGroup JSON-LD describes a product family with multiple variants (colors, sizes) and links them together, enabling AI agents to show a single product card instead of one card per variant. Most Shopify themes do not generate ProductGroup JSON-LD by default. Adding it is a theme customization that significantly improves how multi-variant products appear in ChatGPT Shopping and Google AI Mode panels. See the ProductGroup JSON-LD guide for implementation.

Run an 18-signal structured data audit on your Shopify store in under 2 minutes.

Run the free CatalogScan →