Blog › Technical SEO

How to audit Shopify structured data: the exact 3-tool workflow to verify your JSON-LD is working for AI shopping agents

CatalogScan — June 5, 2026 — Technical SEO Structured Data AI Shopping

Most Shopify merchants who've added JSON-LD to their theme believe their structured data is working. Most of the time, it isn't — not in the form AI shopping agents can actually parse and trust. Here's the exact audit workflow that surfaces what's broken.

67%

of Shopify stores with JSON-LD have at least one critical parse error (CatalogScan corpus, 2026)

silent failure patterns that break structured data without throwing a visible page error

tools needed for a complete structured data audit — and only one requires Search Console access

In this guide

The gap between "I added JSON-LD" and "AI agents are reading it"
Tool 1: Rich Results Test — per-product verification
Tool 2: Schema.org Validator — JSON-LD syntax and property validation
Tool 3: Google Search Console — catalog-wide coverage errors
Bonus: manual curl verification for crawlability
5 silent failure patterns and how to fix them
How CatalogScan's automated audit fits into the workflow
10-step structured data audit checklist
FAQ

The gap between "I added JSON-LD" and "AI agents are reading it"

When Shopify's default theme outputs structured data, it includes a <script type="application/ld+json"> block on product pages with a basic Product schema: name, description, image, offers. Many merchants stop there and assume the work is done.

But AI shopping agents — ChatGPT Shopping, Perplexity Commerce, Google AI Mode, Shopify's own Global Catalog feed — don't give partial credit. If a required property is missing, incorrectly typed, or rendered in a form the parser doesn't recognize, the entire product may be excluded from the agent's consideration set for that query signal. The failure is invisible: your page still renders, your products still sell through human-initiated search, but you're absent from AI-mediated recommendations.

The audit gap has three common sources:

Rendering failures: Liquid template variables that produce empty strings, null values, or raw Liquid syntax in the JSON-LD output when the product is missing a metafield or variant attribute
Type mismatches: Price as a formatted string ("$29.99") rather than a bare number (29.99), or availability as a custom string ("available") rather than a schema.org URI ("https://schema.org/InStock")
Scope problems: Structured data present on the homepage or collection pages but absent or malformed on the individual product page URLs that AI agents actually crawl

None of these trigger a 500 error. None produce a broken page. They only show up when you test the JSON-LD output directly — which is what the three-tool audit is for.

Related guides

Tool 1: Rich Results Test — per-product verification

Rich Results Test search.google.com/test/rich-results

No login required — works on any public URL

Google's Rich Results Test renders your page the way Googlebot would — executing JavaScript, resolving Liquid-rendered output — and then parses every structured data block it finds. For Shopify, this means it sees the actual JSON-LD that Googlebot sees, not the Liquid template source.

What to test:

Your highest-traffic product page (the one most likely to have complete data)
A product with no metafields set (the "worst case" — likely to expose Liquid rendering gaps)
A product that has multiple variants at different price points (tests price range rendering)
A product that is currently out of stock (tests availability enum rendering)

What to look for in the results:

Detected items: Should show "Product" — if it shows nothing, your JSON-LD is either absent, malformed to the point of being unparseable, or placed after the closing </body> tag
Errors vs. Warnings: Errors block rich result eligibility. Warnings are advisory. For AI shopping agent purposes, treat both as blocking — agents are stricter than Google's rich-result eligibility rules
Missing required fields: "name", "image", "description" are required. Any missing = error
Offer block: Check that "price", "priceCurrency", and "availability" are all present and showing real values — not empty strings or Liquid syntax leaking through

Reading a Rich Results Test output for a Shopify product

When you paste a product URL and click "Test URL," the tool shows you a parsed tree of every JSON-LD block it found. For Shopify's default theme output, you'll typically see a Product item with nested Offer items. The critical things to check in each Offer:

Property	Expected value	Common broken state
`price`	29.99	Empty string, `"$29.99"` (currency symbol included), or `"29.99"` (string not number — technically allowed in JSON-LD but flags a warning)
`priceCurrency`	USD	Empty string (metafield not set), `"$"` (symbol instead of ISO code), missing entirely
`availability`	https://schema.org/InStock	`"InStock"` (bare string), `"available"` (custom string), empty string when Shopify returns `nil` for out-of-stock variants
`url`	https://store.com/products/handle	Relative URL (`/products/handle`), missing entirely, or variant-scoped URL on a product with no variants
`image`	https://cdn.shopify.com/...	Empty array `[]` (product has no images), CDN URL without scheme (`//cdn.shopify.com/...`)

Tip: Run the Rich Results Test on your canonical product URL (without variant query parameters), then again on a variant URL like /products/handle?variant=12345. Some Shopify themes output different JSON-LD depending on which variant is selected — and the default (no variant) URL may render an empty price if the theme uses JavaScript to populate it dynamically.

Tool 2: Schema.org Validator — JSON-LD syntax and property validation

Schema.org Validator validator.schema.org

No login required — accepts URL or paste-in markup

While the Rich Results Test validates against Google's rich-result eligibility requirements, the Schema.org Validator checks compliance against the full schema.org specification. It catches property names that Google silently ignores (but AI agents may rely on), incorrect type nesting, and deprecated properties that were replaced in more recent schema.org releases.

What it catches that Rich Results Test misses:

Properties on the wrong type (gtin13 on an Organization instead of a Product)
Deprecated property names (e.g., offers.seller syntax changes between schema.org versions)
Type mismatches where a property expects a URL type but receives a plain string
Missing @context or incorrect context URL
GTIN format validation — a gtin13 that isn't 13 digits will pass the Rich Results Test but fail the Schema.org Validator

The paste-in workflow for Shopify

The Schema.org Validator's URL mode doesn't execute JavaScript, so it sees the raw server-rendered HTML. For Shopify stores using JavaScript-injected structured data (rare but possible with some headless setups), use the paste-in mode instead:

Open a product page in your browser, right-click → "View Page Source" (not Inspect — you want the raw HTML, not the post-JavaScript DOM)
Search for application/ld+json — copy the entire content of the script tag (the JSON object inside)
Paste into the Schema.org Validator's "Validate by Direct Input" tab
Note every red error (blocking) and yellow warning (advisory)

For Shopify Online Store 2.0 themes, the JSON-LD is server-rendered by Liquid, so the URL mode works correctly. The paste-in approach is only necessary if your theme uses a custom storefront or injects structured data via a JavaScript app.

Comparing the two tools' error sets

It's worth running both tools on the same URL and comparing the error lists. They use different validation rule sets and will often catch non-overlapping issues. A clean Rich Results Test result does not mean clean Schema.org validation — and for AI shopping agents that implement the full schema.org spec rather than just Google's rich-result subset, the Schema.org Validator errors are the ones that matter.

Deeper reading

Tool 3: Google Search Console — catalog-wide coverage errors

Google Search Console — Enhancements › Shopping tab Requires GSC access

search.google.com/search-console — requires property verification

The Rich Results Test and Schema.org Validator check individual URLs. Google Search Console's structured data report shows you errors across your entire product catalog — and clusters them by error type so you can fix one template problem that's affecting 400 products simultaneously.

Where to find it: In Search Console, go to Shopping in the left nav (under Enhancements). This shows Product-type structured data errors across all crawled pages. If you don't see the Shopping section, your domain hasn't had any Product JSON-LD crawled yet (or the structured data is sufficiently broken that Googlebot couldn't classify it).

Reading the Search Console structured data report for Shopify

The report groups errors into three categories:

Errors: Properties present but with invalid values, or required properties missing. Each error shows an "Affected URLs" count — click through to see which specific product pages are affected.
Warnings: Recommended properties missing. These don't block rich result eligibility but do reduce the quality score that AI shopping agents use when ranking product matches.
Valid with warnings: Pages where the structured data is technically parseable but has advisory issues.

The most common Search Console errors seen in Shopify catalogs:

Error message	What it means for Shopify	Typical cause
"Missing field 'price'"	Offer block is present but price is empty or missing	Product has no active variants; Liquid `variant.price` returns nil
"Invalid value for field 'availability'"	Availability string not a recognized schema.org URI	Theme outputs bare string (`InStock`) instead of full URI
"Missing field 'priceCurrency'"	Offer block has price but no currency	Theme hardcodes USD or uses `shop.currency` which returns empty in some market configurations
"Invalid value for field 'image'"	Image URL is protocol-relative or returns 404	CDN URL missing https: scheme, or deleted product image still referenced in Liquid
"Missing field 'description'"	Product description empty or stripped to empty string	Product has no description set in Shopify admin; Liquid outputs empty string with no fallback

Important: Search Console data reflects Googlebot's crawl, which can lag 1–3 weeks behind your current theme state. After fixing a template-level structured data error, use the URL Inspection tool in Search Console to fetch the current live version of a specific URL and immediately see whether the fix resolved the error for that URL, without waiting for the next crawl.

Bonus: manual curl verification for crawlability

Before any structured data can be read, the crawler has to be able to reach your pages. A quick curl check tells you whether AI shopping agent crawlers are being blocked, redirected, or served different content than what your browser sees.

Check that JSON-LD is present in the server-rendered HTML

curl -s https://your-store.com/products/your-product-handle | grep -c "application/ld+json"

Should return a non-zero number. Zero means no JSON-LD is server-rendered at all — the structured data is either absent or injected client-side (invisible to crawlers).

Check that AI crawlers aren't blocked by Cloudflare or a WAF

# Simulate OAI-SearchBot (ChatGPT Shopping crawler)
curl -s -o /dev/null -w "%{http_code}" \
  -H "User-Agent: OAI-SearchBot/1.0 (+https://openai.com/searchbot)" \
  https://your-store.com/products/your-product-handle

# Simulate PerplexityBot
curl -s -o /dev/null -w "%{http_code}" \
  -H "User-Agent: PerplexityBot/1.0 (+https://docs.perplexity.ai/bots)" \
  https://your-store.com/products/your-product-handle

Both should return 200. A 403 or 429 means your CDN or WAF is blocking AI crawlers by user agent — a common misconfiguration documented in detail in our Cloudflare AI crawler guide. A 301 or 302 redirect chain on product URLs can also cause crawlers to abandon indexing if the redirect target is slow or returns a different content type.

Extract the raw JSON-LD for inspection

curl -s https://your-store.com/products/your-product-handle \
  | grep -o '<script type="application/ld+json">[^<]*</script>' \
  | sed 's/<[^>]*>//g'

This prints the raw JSON to your terminal. Pipe it to a JSON formatter (| node -e "process.stdin.resume();let d='';process.stdin.on('data',c=>d+=c);process.stdin.on('end',()=>console.log(JSON.stringify(JSON.parse(d),null,2)))") to check for Liquid syntax leaking through, empty strings in critical fields, or malformed JSON that would cause a parse error.

5 silent failure patterns and how to fix them

These patterns are "silent" because they produce no visible page error — your store works fine for human visitors. But they reliably cause AI shopping agents to either skip your product or index it with degraded signals.

Liquid variable rendering as empty string in JSON-LD

When a product is missing a metafield, variant attribute, or description, Shopify Liquid outputs an empty string — and your JSON-LD ends up with "price": "" or "description": "". This fails schema validation even though the JSON is syntactically valid. The product will either be excluded from AI recommendation sets or demoted due to incomplete signals.

How to detect Rich Results Test will show "Invalid value for field 'price'" on the affected products.

Fix Add Liquid conditionals around each property to emit the property only when the value is non-empty: {%- if product.price > 0 -%}"price": {{ product.price | money_without_currency | remove: "," }}{%- endif -%}. For description: use a fallback chain — product description, then product type, then a static default that describes the product category.

Availability value not a recognized schema.org URI

Shopify's Liquid returns availability as a short string ("in stock", "out of stock", "preorder"). Many Shopify themes output this directly into the JSON-LD availability field. The schema.org spec requires a full URI like https://schema.org/InStock. AI shopping agents that implement the full spec will reject the bare string and treat the product as having no availability signal — which often means they won't recommend it for availability-sensitive queries ("in stock now", "ships today").

How to detect Schema.org Validator shows "Invalid value for property availability" or Rich Results Test shows "Invalid value for field 'availability'".

Fix Map Shopify's availability strings to schema.org URIs with a Liquid conditional block:

{%- if product.available -%}https://schema.org/InStock{%- else -%}https://schema.org/OutOfStock{%- endif -%}

For pre-order products, add a metafield check: if product.metafields.availability.is_preorder == true, output https://schema.org/PreOrder instead.

HTML entities encoded inside JSON-LD strings

Some Shopify themes run product names and descriptions through Liquid's HTML escaping filter before embedding them in JSON-LD. The result is product names like "Women&s Running Shoes" inside the JSON string. JSON-LD parsers don't decode HTML entities — they treat & as the literal text, so your product name becomes "Women&s Running Shoes" in the AI agent's index. This corrupts the name, breaks keyword matching, and makes your product appear in AI recommendations with garbled metadata.

How to detect Curl the product page, extract the raw JSON-LD, and search for &, ", or < inside string values.

Fix In Shopify Liquid, use product.title | strip_html (not | escape or | xml_escape) when embedding values into JSON-LD. The escape filter is for HTML attribute contexts. For JSON strings, use | strip_html | replace: '"', '\"' to handle embedded quotation marks without HTML-encoding the ampersands.

Price formatted as a currency string instead of a number

Shopify Liquid's money filter outputs a human-formatted price string like "$29.99" or "USD 29.99". When this is embedded directly into the JSON-LD price property, the value is a string containing a currency symbol — not a number. The schema.org spec defines price as a Number or a string representation of a number without currency symbols. Some AI shopping agents accept formatted strings; many reject the currency symbol prefix and either exclude the price signal or fail to parse the offer entirely.

How to detect Run the Schema.org Validator and look for "The 'price' property has a value that does not look like a number" warning. Or curl the product and grep for the JSON-LD price value — if it starts with a currency symbol, it's malformed.

Fix Use Shopify Liquid's money_without_currency filter and strip the thousands comma: {{ variant.price | money_without_currency | remove: "," }}. This outputs 29.99 (a bare decimal number as a string), which all schema.org parsers accept as a valid price value.

Structured data present on homepage but absent from product pages

AI shopping agents crawl product page URLs, not the homepage — that's where the actual product data lives. Some Shopify themes (especially older Debut-era themes) output rich structured data on the homepage (Organization, WebSite, perhaps a FeaturedCollection), but product pages only get a minimal or malformed Product block. The homepage test passes; the product pages fail. Since most merchants test their homepage first, this is one of the most common reasons an audit shows "structured data working" while the actual catalog is invisible to AI agents.

How to detect Test three different URLs with the Rich Results Test: homepage, a product page, a collection page. Compare the detected item types. A healthy Shopify store should return "Product" on product pages with a complete Offer block.

Fix Check your theme's product.liquid template (in Shopify Online Store 2.0, sections/main-product.liquid) for a script type="application/ld+json" block. If it's missing, add a Product schema block to the template. Do not rely on app-injected structured data in the page footer — place the JSON-LD block in the <head> of the product template for guaranteed above-fold rendering.

How CatalogScan's automated audit fits into the workflow

The three-tool manual workflow above is thorough but time-consuming — especially for catalogs with hundreds or thousands of products where the Rich Results Test and Schema.org Validator require individual URL checks. CatalogScan's automated scan addresses the catalog-scale problem: it crawls your entire product feed the way AI shopping agents do, extracts the JSON-LD from every product page, and scores each of the 18 AI-agent-critical signals across your full catalog.

The output maps directly to what the three-tool workflow would find if you ran it on every product:

Manual tool	What it checks	CatalogScan equivalent
Rich Results Test	Per-product Google parse validity + required field presence	JSON-LD parse score across all products + signal-level pass/fail breakdown
Schema.org Validator	Full spec compliance, type correctness, deprecated properties	Property-level validation including availability URI format, price number type, GTIN digit count
Search Console	Catalog-wide error clustering, affected URL counts	"Top 5 fixes" report — errors ranked by how many products are affected and estimated AI visibility impact
curl + grep	Crawler accessibility, server-rendered JSON-LD presence	Crawlability check included in scan; flags bot-block patterns by user agent

The manual workflow is still valuable for two reasons: it lets you verify CatalogScan's findings independently before making theme changes, and it gives you the context to explain specific errors to a developer in terms of the exact field, tool, and error message they'll see when they test the fix.

Recommended workflow: Run a CatalogScan scan to identify which products and which specific error types are affecting your catalog most broadly. Then use the Rich Results Test and Schema.org Validator to verify the top 3 error types on representative product pages before and after making theme changes. Use Search Console to confirm the fix has propagated across the crawled catalog 1–2 weeks after deployment.

10-step structured data audit checklist

Rich Results Test: test highest-traffic product URL — "Product" detected, no errors in Offer block
Rich Results Test: test a product with no metafields set — no empty-string values in required properties
Rich Results Test: test an out-of-stock product — availability shows full schema.org URI, not bare string
Schema.org Validator: paste raw JSON-LD from a product page — no red errors, especially around gtin, availability, or price type
Schema.org Validator: confirm priceCurrency is ISO 4217 code (USD, EUR, GBP) — not a currency symbol
curl check: curl -s URL | grep -c "application/ld+json" returns non-zero on product pages
curl check: OAI-SearchBot and PerplexityBot user agents both receive HTTP 200 on product pages
curl check: JSON-LD output contains no HTML entities (&, ") inside string values
Search Console: Shopping enhancements report shows no "Invalid value for field 'availability'" or "Missing field 'price'" errors
Search Console: URL Inspection on top 10 products shows "Page is eligible for rich results" status

Frequently asked questions

Does Shopify's default Dawn theme have correct structured data for AI shopping agents?

Dawn outputs the basic Product + Offer JSON-LD block with name, description, image, price, priceCurrency, and availability. However, it uses bare availability strings (InStock / OutOfStock) rather than schema.org URIs, it doesn't include GTIN or MPN fields even when metafields are populated, and it doesn't handle the case where a product has no active variants (which causes an empty Offer block). For basic Google rich results eligibility, Dawn is sufficient. For AI shopping agent optimization — where GTIN, MPN, brand, and condition signals all affect recommendation ranking — Dawn's output needs to be extended.

How often should I run a structured data audit?

Run the full 3-tool audit whenever you: upgrade your Shopify theme, install or update a JSON-LD app, modify your product.liquid template, or change your pricing structure (especially if adding multi-currency or Shopify Markets). For ongoing monitoring, Google Search Console's structured data report will flag new errors automatically as Googlebot crawls your catalog. For AI shopping agent-specific monitoring, a monthly CatalogScan run catches signal degradation that Search Console doesn't track (like GTIN coverage declining as new products are added without GTINs).

If the Rich Results Test passes, does that mean AI shopping agents can read my structured data?

Not necessarily. The Rich Results Test validates against Google's rich-result eligibility rules, which are a subset of the full schema.org specification. AI shopping agents like ChatGPT Shopping and Perplexity Commerce implement broader portions of the schema.org spec — they use GTIN, MPN, brand, material, condition, and other properties that aren't required for Google's Product rich results. A product that passes the Rich Results Test but is missing GTIN and brand will be eligible for Google shopping rich results but may rank lower (or be excluded) in AI agent recommendation sets for brand-specific or GTIN-matched queries.

What's the difference between an error and a warning in the Rich Results Test?

Errors in the Rich Results Test indicate that a required property is missing or has an invalid value that prevents the page from qualifying for any rich result type. Warnings indicate that a recommended property is missing — the page can still qualify for rich results, but the quality and ranking may be lower. For AI shopping agents, the error/warning distinction is less meaningful: both represent incomplete signals that reduce the agent's ability to match your product to relevant queries. Treat all Rich Results Test warnings as issues to fix for AI shopping optimization, not just errors.

Can I test structured data without a live public URL?

Yes — both the Schema.org Validator and Rich Results Test support "Direct Input" mode where you can paste in raw HTML or JSON-LD text. This is useful for testing structured data in a development or staging environment before deploying to production. For the Rich Results Test, use the "Code" tab (the small icon next to the URL field) to switch to paste mode. Note that Direct Input mode doesn't execute JavaScript, so if your theme injects structured data via JavaScript, paste in the JSON-LD from the post-JavaScript DOM (copy from the browser's Inspect panel, not View Source).

See exactly which structured data signals are failing across your catalog

CatalogScan checks all 18 AI-agent-critical signals across your full product catalog — not just one URL at a time. 90 seconds, no login required.

Run a free catalog scan More guides