Blog › Technical SEO
How to audit Shopify structured data: the exact 3-tool workflow to verify your JSON-LD is working for AI shopping agents
Most Shopify merchants who've added JSON-LD to their theme believe their structured data is working. Most of the time, it isn't — not in the form AI shopping agents can actually parse and trust. Here's the exact audit workflow that surfaces what's broken.
In this guide
- The gap between "I added JSON-LD" and "AI agents are reading it"
- Tool 1: Rich Results Test — per-product verification
- Tool 2: Schema.org Validator — JSON-LD syntax and property validation
- Tool 3: Google Search Console — catalog-wide coverage errors
- Bonus: manual curl verification for crawlability
- 5 silent failure patterns and how to fix them
- How CatalogScan's automated audit fits into the workflow
- 10-step structured data audit checklist
- FAQ
The gap between "I added JSON-LD" and "AI agents are reading it"
When Shopify's default theme outputs structured data, it includes a <script type="application/ld+json"> block on product pages with a basic Product schema: name, description, image, offers. Many merchants stop there and assume the work is done.
But AI shopping agents — ChatGPT Shopping, Perplexity Commerce, Google AI Mode, Shopify's own Global Catalog feed — don't give partial credit. If a required property is missing, incorrectly typed, or rendered in a form the parser doesn't recognize, the entire product may be excluded from the agent's consideration set for that query signal. The failure is invisible: your page still renders, your products still sell through human-initiated search, but you're absent from AI-mediated recommendations.
The audit gap has three common sources:
- Rendering failures: Liquid template variables that produce empty strings, null values, or raw Liquid syntax in the JSON-LD output when the product is missing a metafield or variant attribute
- Type mismatches: Price as a formatted string (
"$29.99") rather than a bare number (29.99), or availability as a custom string ("available") rather than a schema.org URI ("https://schema.org/InStock") - Scope problems: Structured data present on the homepage or collection pages but absent or malformed on the individual product page URLs that AI agents actually crawl
None of these trigger a 500 error. None produce a broken page. They only show up when you test the JSON-LD output directly — which is what the three-tool audit is for.
Related guides
Tool 1: Rich Results Test — per-product verification
Google's Rich Results Test renders your page the way Googlebot would — executing JavaScript, resolving Liquid-rendered output — and then parses every structured data block it finds. For Shopify, this means it sees the actual JSON-LD that Googlebot sees, not the Liquid template source.
What to test:
- Your highest-traffic product page (the one most likely to have complete data)
- A product with no metafields set (the "worst case" — likely to expose Liquid rendering gaps)
- A product that has multiple variants at different price points (tests price range rendering)
- A product that is currently out of stock (tests availability enum rendering)
What to look for in the results:
- Detected items: Should show "Product" — if it shows nothing, your JSON-LD is either absent, malformed to the point of being unparseable, or placed after the closing
</body>tag - Errors vs. Warnings: Errors block rich result eligibility. Warnings are advisory. For AI shopping agent purposes, treat both as blocking — agents are stricter than Google's rich-result eligibility rules
- Missing required fields: "name", "image", "description" are required. Any missing = error
- Offer block: Check that "price", "priceCurrency", and "availability" are all present and showing real values — not empty strings or Liquid syntax leaking through
Reading a Rich Results Test output for a Shopify product
When you paste a product URL and click "Test URL," the tool shows you a parsed tree of every JSON-LD block it found. For Shopify's default theme output, you'll typically see a Product item with nested Offer items. The critical things to check in each Offer:
| Property | Expected value | Common broken state |
|---|---|---|
price |
29.99 | Empty string, "$29.99" (currency symbol included), or "29.99" (string not number — technically allowed in JSON-LD but flags a warning) |
priceCurrency |
USD | Empty string (metafield not set), "$" (symbol instead of ISO code), missing entirely |
availability |
https://schema.org/InStock | "InStock" (bare string), "available" (custom string), empty string when Shopify returns nil for out-of-stock variants |
url |
https://store.com/products/handle | Relative URL (/products/handle), missing entirely, or variant-scoped URL on a product with no variants |
image |
https://cdn.shopify.com/... | Empty array [] (product has no images), CDN URL without scheme (//cdn.shopify.com/...) |
/products/handle?variant=12345. Some Shopify themes output different JSON-LD depending on which variant is selected — and the default (no variant) URL may render an empty price if the theme uses JavaScript to populate it dynamically.
Tool 2: Schema.org Validator — JSON-LD syntax and property validation
While the Rich Results Test validates against Google's rich-result eligibility requirements, the Schema.org Validator checks compliance against the full schema.org specification. It catches property names that Google silently ignores (but AI agents may rely on), incorrect type nesting, and deprecated properties that were replaced in more recent schema.org releases.
What it catches that Rich Results Test misses:
- Properties on the wrong type (
gtin13on anOrganizationinstead of aProduct) - Deprecated property names (e.g.,
offers.sellersyntax changes between schema.org versions) - Type mismatches where a property expects a
URLtype but receives a plain string - Missing
@contextor incorrect context URL - GTIN format validation — a
gtin13that isn't 13 digits will pass the Rich Results Test but fail the Schema.org Validator
The paste-in workflow for Shopify
The Schema.org Validator's URL mode doesn't execute JavaScript, so it sees the raw server-rendered HTML. For Shopify stores using JavaScript-injected structured data (rare but possible with some headless setups), use the paste-in mode instead:
- Open a product page in your browser, right-click → "View Page Source" (not Inspect — you want the raw HTML, not the post-JavaScript DOM)
- Search for
application/ld+json— copy the entire content of the script tag (the JSON object inside) - Paste into the Schema.org Validator's "Validate by Direct Input" tab
- Note every red error (blocking) and yellow warning (advisory)
For Shopify Online Store 2.0 themes, the JSON-LD is server-rendered by Liquid, so the URL mode works correctly. The paste-in approach is only necessary if your theme uses a custom storefront or injects structured data via a JavaScript app.
Comparing the two tools' error sets
It's worth running both tools on the same URL and comparing the error lists. They use different validation rule sets and will often catch non-overlapping issues. A clean Rich Results Test result does not mean clean Schema.org validation — and for AI shopping agents that implement the full schema.org spec rather than just Google's rich-result subset, the Schema.org Validator errors are the ones that matter.
Deeper reading
Tool 3: Google Search Console — catalog-wide coverage errors
The Rich Results Test and Schema.org Validator check individual URLs. Google Search Console's structured data report shows you errors across your entire product catalog — and clusters them by error type so you can fix one template problem that's affecting 400 products simultaneously.
Where to find it: In Search Console, go to Shopping in the left nav (under Enhancements). This shows Product-type structured data errors across all crawled pages. If you don't see the Shopping section, your domain hasn't had any Product JSON-LD crawled yet (or the structured data is sufficiently broken that Googlebot couldn't classify it).
Reading the Search Console structured data report for Shopify
The report groups errors into three categories:
- Errors: Properties present but with invalid values, or required properties missing. Each error shows an "Affected URLs" count — click through to see which specific product pages are affected.
- Warnings: Recommended properties missing. These don't block rich result eligibility but do reduce the quality score that AI shopping agents use when ranking product matches.
- Valid with warnings: Pages where the structured data is technically parseable but has advisory issues.
The most common Search Console errors seen in Shopify catalogs:
| Error message | What it means for Shopify | Typical cause |
|---|---|---|
| "Missing field 'price'" | Offer block is present but price is empty or missing | Product has no active variants; Liquid variant.price returns nil |
| "Invalid value for field 'availability'" | Availability string not a recognized schema.org URI | Theme outputs bare string (InStock) instead of full URI |
| "Missing field 'priceCurrency'" | Offer block has price but no currency | Theme hardcodes USD or uses shop.currency which returns empty in some market configurations |
| "Invalid value for field 'image'" | Image URL is protocol-relative or returns 404 | CDN URL missing https: scheme, or deleted product image still referenced in Liquid |
| "Missing field 'description'" | Product description empty or stripped to empty string | Product has no description set in Shopify admin; Liquid outputs empty string with no fallback |
Bonus: manual curl verification for crawlability
Before any structured data can be read, the crawler has to be able to reach your pages. A quick curl check tells you whether AI shopping agent crawlers are being blocked, redirected, or served different content than what your browser sees.
Check that JSON-LD is present in the server-rendered HTML
curl -s https://your-store.com/products/your-product-handle | grep -c "application/ld+json"
Should return a non-zero number. Zero means no JSON-LD is server-rendered at all — the structured data is either absent or injected client-side (invisible to crawlers).
Check that AI crawlers aren't blocked by Cloudflare or a WAF
# Simulate OAI-SearchBot (ChatGPT Shopping crawler)
curl -s -o /dev/null -w "%{http_code}" \
-H "User-Agent: OAI-SearchBot/1.0 (+https://openai.com/searchbot)" \
https://your-store.com/products/your-product-handle
# Simulate PerplexityBot
curl -s -o /dev/null -w "%{http_code}" \
-H "User-Agent: PerplexityBot/1.0 (+https://docs.perplexity.ai/bots)" \
https://your-store.com/products/your-product-handle
Both should return 200. A 403 or 429 means your CDN or WAF is blocking AI crawlers by user agent — a common misconfiguration documented in detail in our Cloudflare AI crawler guide. A 301 or 302 redirect chain on product URLs can also cause crawlers to abandon indexing if the redirect target is slow or returns a different content type.
Extract the raw JSON-LD for inspection
curl -s https://your-store.com/products/your-product-handle \
| grep -o '<script type="application/ld+json">[^<]*</script>' \
| sed 's/<[^>]*>//g'
This prints the raw JSON to your terminal. Pipe it to a JSON formatter (| node -e "process.stdin.resume();let d='';process.stdin.on('data',c=>d+=c);process.stdin.on('end',()=>console.log(JSON.stringify(JSON.parse(d),null,2)))") to check for Liquid syntax leaking through, empty strings in critical fields, or malformed JSON that would cause a parse error.
5 silent failure patterns and how to fix them
These patterns are "silent" because they produce no visible page error — your store works fine for human visitors. But they reliably cause AI shopping agents to either skip your product or index it with degraded signals.
"price": "" or "description": "". This fails schema validation even though the JSON is syntactically valid. The product will either be excluded from AI recommendation sets or demoted due to incomplete signals.
{%- if product.price > 0 -%}"price": {{ product.price | money_without_currency | remove: "," }}{%- endif -%}. For description: use a fallback chain — product description, then product type, then a static default that describes the product category.
"in stock", "out of stock", "preorder"). Many Shopify themes output this directly into the JSON-LD availability field. The schema.org spec requires a full URI like https://schema.org/InStock. AI shopping agents that implement the full spec will reject the bare string and treat the product as having no availability signal — which often means they won't recommend it for availability-sensitive queries ("in stock now", "ships today").
{%- if product.available -%}https://schema.org/InStock{%- else -%}https://schema.org/OutOfStock{%- endif -%}
For pre-order products, add a metafield check: if
product.metafields.availability.is_preorder == true, output https://schema.org/PreOrder instead.
"Women&s Running Shoes" inside the JSON string. JSON-LD parsers don't decode HTML entities — they treat & as the literal text, so your product name becomes "Women&s Running Shoes" in the AI agent's index. This corrupts the name, breaks keyword matching, and makes your product appear in AI recommendations with garbled metadata.
&, ", or < inside string values.
product.title | strip_html (not | escape or | xml_escape) when embedding values into JSON-LD. The escape filter is for HTML attribute contexts. For JSON strings, use | strip_html | replace: '"', '\"' to handle embedded quotation marks without HTML-encoding the ampersands.
money filter outputs a human-formatted price string like "$29.99" or "USD 29.99". When this is embedded directly into the JSON-LD price property, the value is a string containing a currency symbol — not a number. The schema.org spec defines price as a Number or a string representation of a number without currency symbols. Some AI shopping agents accept formatted strings; many reject the currency symbol prefix and either exclude the price signal or fail to parse the offer entirely.
money_without_currency filter and strip the thousands comma: {{ variant.price | money_without_currency | remove: "," }}. This outputs 29.99 (a bare decimal number as a string), which all schema.org parsers accept as a valid price value.
product.liquid template (in Shopify Online Store 2.0, sections/main-product.liquid) for a script type="application/ld+json" block. If it's missing, add a Product schema block to the template. Do not rely on app-injected structured data in the page footer — place the JSON-LD block in the <head> of the product template for guaranteed above-fold rendering.
How CatalogScan's automated audit fits into the workflow
The three-tool manual workflow above is thorough but time-consuming — especially for catalogs with hundreds or thousands of products where the Rich Results Test and Schema.org Validator require individual URL checks. CatalogScan's automated scan addresses the catalog-scale problem: it crawls your entire product feed the way AI shopping agents do, extracts the JSON-LD from every product page, and scores each of the 18 AI-agent-critical signals across your full catalog.
The output maps directly to what the three-tool workflow would find if you ran it on every product:
| Manual tool | What it checks | CatalogScan equivalent |
|---|---|---|
| Rich Results Test | Per-product Google parse validity + required field presence | JSON-LD parse score across all products + signal-level pass/fail breakdown |
| Schema.org Validator | Full spec compliance, type correctness, deprecated properties | Property-level validation including availability URI format, price number type, GTIN digit count |
| Search Console | Catalog-wide error clustering, affected URL counts | "Top 5 fixes" report — errors ranked by how many products are affected and estimated AI visibility impact |
| curl + grep | Crawler accessibility, server-rendered JSON-LD presence | Crawlability check included in scan; flags bot-block patterns by user agent |
The manual workflow is still valuable for two reasons: it lets you verify CatalogScan's findings independently before making theme changes, and it gives you the context to explain specific errors to a developer in terms of the exact field, tool, and error message they'll see when they test the fix.
10-step structured data audit checklist
- Rich Results Test: test highest-traffic product URL — "Product" detected, no errors in Offer block
- Rich Results Test: test a product with no metafields set — no empty-string values in required properties
- Rich Results Test: test an out-of-stock product —
availabilityshows full schema.org URI, not bare string - Schema.org Validator: paste raw JSON-LD from a product page — no red errors, especially around
gtin,availability, orpricetype - Schema.org Validator: confirm
priceCurrencyis ISO 4217 code (USD, EUR, GBP) — not a currency symbol - curl check:
curl -s URL | grep -c "application/ld+json"returns non-zero on product pages - curl check: OAI-SearchBot and PerplexityBot user agents both receive HTTP 200 on product pages
- curl check: JSON-LD output contains no HTML entities (
&,") inside string values - Search Console: Shopping enhancements report shows no "Invalid value for field 'availability'" or "Missing field 'price'" errors
- Search Console: URL Inspection on top 10 products shows "Page is eligible for rich results" status
Frequently asked questions
Does Shopify's default Dawn theme have correct structured data for AI shopping agents?
Dawn outputs the basic Product + Offer JSON-LD block with name, description, image, price, priceCurrency, and availability. However, it uses bare availability strings (InStock / OutOfStock) rather than schema.org URIs, it doesn't include GTIN or MPN fields even when metafields are populated, and it doesn't handle the case where a product has no active variants (which causes an empty Offer block). For basic Google rich results eligibility, Dawn is sufficient. For AI shopping agent optimization — where GTIN, MPN, brand, and condition signals all affect recommendation ranking — Dawn's output needs to be extended.
How often should I run a structured data audit?
Run the full 3-tool audit whenever you: upgrade your Shopify theme, install or update a JSON-LD app, modify your product.liquid template, or change your pricing structure (especially if adding multi-currency or Shopify Markets). For ongoing monitoring, Google Search Console's structured data report will flag new errors automatically as Googlebot crawls your catalog. For AI shopping agent-specific monitoring, a monthly CatalogScan run catches signal degradation that Search Console doesn't track (like GTIN coverage declining as new products are added without GTINs).
If the Rich Results Test passes, does that mean AI shopping agents can read my structured data?
Not necessarily. The Rich Results Test validates against Google's rich-result eligibility rules, which are a subset of the full schema.org specification. AI shopping agents like ChatGPT Shopping and Perplexity Commerce implement broader portions of the schema.org spec — they use GTIN, MPN, brand, material, condition, and other properties that aren't required for Google's Product rich results. A product that passes the Rich Results Test but is missing GTIN and brand will be eligible for Google shopping rich results but may rank lower (or be excluded) in AI agent recommendation sets for brand-specific or GTIN-matched queries.
What's the difference between an error and a warning in the Rich Results Test?
Errors in the Rich Results Test indicate that a required property is missing or has an invalid value that prevents the page from qualifying for any rich result type. Warnings indicate that a recommended property is missing — the page can still qualify for rich results, but the quality and ranking may be lower. For AI shopping agents, the error/warning distinction is less meaningful: both represent incomplete signals that reduce the agent's ability to match your product to relevant queries. Treat all Rich Results Test warnings as issues to fix for AI shopping optimization, not just errors.
Can I test structured data without a live public URL?
Yes — both the Schema.org Validator and Rich Results Test support "Direct Input" mode where you can paste in raw HTML or JSON-LD text. This is useful for testing structured data in a development or staging environment before deploying to production. For the Rich Results Test, use the "Code" tab (the small icon next to the URL field) to switch to paste mode. Note that Direct Input mode doesn't execute JavaScript, so if your theme injects structured data via JavaScript, paste in the JSON-LD from the post-JavaScript DOM (copy from the browser's Inspect panel, not View Source).
See exactly which structured data signals are failing across your catalog
CatalogScan checks all 18 AI-agent-critical signals across your full product catalog — not just one URL at a time. 90 seconds, no login required.
Run a free catalog scan More guides