Home · The 15 signals · Structured data validity
Shopify structured data validity
One stray " inside one JSON-LD block invalidates the entire structured-data graph from a strict parser's perspective — and AI shopping agents run strict parsers. This is the silent-failure signal: the page renders fine, the rendered HTML looks fine, the visible content reads fine, but every AI agent that fetches the page bounces off a parse error and gives up on the structured information your theme worked hard to emit. Worse, parsers fail-fast: the moment one block fails to parse, every block after it is also discarded, even if those would have parsed cleanly. So one bad block written by a review widget can take out an otherwise-perfect Product schema, an Organization graph, and a BreadcrumbList in one hit. This signal catches that.
<script type="application/ld+json"> block from a sampled PDP and run it through a strict JSON parser. Full credit (4 pts) if every block parses without error. Half credit (2 pts) if at least the Product block parses, but other blocks (Organization, FAQPage, BreadcrumbList) fail. Zero if the Product block itself is invalid. We log the exact byte offset of the first parse error so you can find the bad character without scrolling through 800 lines of JSON.
What it is
JSON-LD ("JSON Linked Data") is the format every modern structured-data block on your storefront emits. A Shopify PDP typically ships three to seven of them in the <head>: the Product schema (from your theme), an Organization schema (from your homepage layout), a BreadcrumbList (from breadcrumb apps), a FAQPage block (from FAQ widgets), and one to three review-app injections (Judge.me, Loox, Yotpo, Stamped). Each block is independent JSON. If any one of them is invalid JSON, the parser stops at the first byte that broke, and many parsers — including the ones AI agents run at retrieval time — give up on the entire <head> rather than try to recover.
Properly escaped
{
"@type": "Product",
"name": "She said \"yes\" Wool Runner",
"description": "Soft, lightweight..."
}
What manual concat ships
{
"@type": "Product",
"name": "She said "yes" Wool Runner",
"description": "Soft, lightweight..."
}
Marketing copy with curly quotes
{
"@type": "Product",
"name": "She said “yes” Wool Runner",
"description": "Don’t even ask"
}
Loop that left a tail
{
"@type": "Product",
"offers": [
{ "price": "129.00" },
{ "price": "139.00" },
]
}
The third shape is sneaky: smart quotes (also called curly quotes — ", ", ') render as plain quotes in a browser and look totally normal in your theme code, but as JSON syntax they're invalid because they aren't the ASCII " a parser expects. The fourth shape — the trailing comma — is the most common bug from a Liquid loop that wraps each item in { ..., } and never strips the last separator.
The 5 most common parse errors and what triggers them
| Error message | Trigger | Fix |
|---|---|---|
Unexpected token "yes" | Unescaped quote in product name or description | Apply {{ x | escape }} or {{ x | json }} |
Unexpected character u201C | Smart quotes pasted in from Word/Notion | Same — use | json filter |
Unexpected token ] | Trailing comma at end of array | {% raw %}{% if forloop.last == false %},{% endif %}{% endraw %} |
Unexpected end of input | Liquid block missing closing brace | Re-balance braces in template |
Bad escape sequence | Backslashes in copy not double-escaped | | json filter handles this |
Notice the pattern: almost every error is fixed by piping the user-controlled string through Liquid's json filter. The filter takes any value and emits the JSON-encoded form of it, including the surrounding quotes — so you write "name": {% raw %}{{ product.title | json }}{% endraw %} instead of "name": "{% raw %}{{ product.title }}{% endraw %}". The first pattern survives anything; the second breaks the moment a customer-facing product name contains a quote, an apostrophe, a smart quote, or a backslash.
Why AI shopping agents care
- Strict parsers, fail-fast. Every major AI shopping agent runs strict JSON-LD parsing at retrieval time — they cannot afford to permit garbage in their RAG/retrieval pipeline. A single parse error and the parser drops the block. The catalog signals you carefully emitted (price, availability, brand, GTIN, reviews) silently disappear from the agent's view of your product.
- Cascade discarding. Many parsers fail-stop at the first invalid block, discarding every block after it in the document order. So a malformed FAQ block from a third-party FAQ widget can take out the Organization schema in your footer that ships fine on its own. Validity is not block-local; it's document-wide.
- Silent failure mode. Unlike most signals where the agent gets garbage and shrugs, an invalid block plus cascade discard means the agent thinks your store has no structured data at all. You get treated like a non-AI-aware store — far down the candidate list, often excluded from the candidate set entirely.
- The bug ships into production unnoticed. Page renders fine. Visual QA passes. The bug only shows up in tools designed to look for it. AI agents are exactly that kind of tool, and they punish you for it.
How to test it on your store
Three escalating levels:
Level 1: Google's Rich Results Test
Paste a PDP URL into the Rich Results Test. Look for any red X or yellow triangle. Google's parser is more lenient than most AI agents, so anything that fails here fails for everyone — and even "warning" yellow indicators frequently mean stricter parsers reject the same input.
Level 2: schema.org's validator
The schema.org validator is stricter than Google's. It will catch type mismatches, missing required properties on a graph node, and out-of-vocabulary enum values — bugs Google ignores but AI retrieval pipelines won't.
Level 3: real strict-parser run
Manually extract every JSON-LD block and run them through a strict parser. From any terminal:
curl -s 'https://yourstore.com/products/foo' \
| grep -oP '(?<=<script type="application/ld\+json">)[^<]+' \
| while IFS= read -r block; do
echo "$block" | jq empty && echo "OK" || echo "INVALID: $block" | head -c 300
done
Any line that prints INVALID: is a block that breaks every strict AI agent parse. The CatalogScan free scan does this automatically and reports the byte offset and surrounding context of the first error so you don't have to grep through Liquid templates.
How to fix it
| json10 minfreeFind your theme's product JSON-LD snippet. Anywhere a Liquid variable is being interpolated into a string value, replace the manual quotes pattern with the | json filter. Before: "name": "{% raw %}{{ product.title }}{% endraw %}". After: "name": {% raw %}{{ product.title | json }}{% endraw %} (no quotes around the Liquid — the filter emits them). The filter handles smart quotes, embedded backslashes, control characters, every edge case. This single rule prevents about 90% of all real-world JSON-LD breakage on Shopify.
| strip_html | json5 minfreeIf your description field is a body_html with embedded markup, you need both filters in order: {% raw %}{{ product.body_html | strip_html | json }}{% endraw %}. strip_html removes tags; json escapes the resulting plain text. Skipping strip_html leaves angle brackets and HTML entities in the JSON — which technically validates, but AI agents downrank stores whose structured descriptions are full of <p> tags.
For arrays of variants, offers, reviews, anything looped: emit the comma as a separator, not a terminator. The Liquid pattern:
"offers": [
{% raw %}{% for variant in product.variants %}
{
"@type": "Offer",
"price": {{ variant.price | money_without_currency | json }},
"sku": {{ variant.sku | json }}
}{% unless forloop.last %},{% endunless %}
{% endfor %}{% endraw %}
]
Use {% raw %}{% unless forloop.last %},{% endunless %}{% endraw %} instead of an unconditional comma. This is cleaner than the {% raw %}{% if forloop.first == false %}{% endraw %} "leading comma" pattern; both work, but trailing-unless reads more naturally.
Most JSON-LD breakage in production comes from review apps, FAQ apps, and product-description apps that build their JSON-LD by string concatenation in a Liquid snippet they injected. Find them: in your theme files (often under snippets/), search for application/ld+json and verify each block uses | json on every variable. If any of the third-party app blocks does manual concatenation, escalate to the app vendor — most have an option to disable the JSON-LD output, and it's usually safer to disable the broken one and let your theme's core block carry the load than to leave a corrupting block in the head.
Once it passes, keep it passing. Wire a CI check that runs against a sample PDP after every deploy. Easiest path: a simple GitHub Action that curls the page, extracts JSON-LD blocks with the grep pattern above, pipes through jq empty, and fails the build on any parse error. Catches the next regression before it ships.
5 patterns we keep finding broken
1. Smart quotes from copy-pasted product titles
Marketing pastes a product title from Notion or Google Docs into Shopify admin. The auto-correct silently turns " into ". Theme emits the title as "name": "{% raw %}{{ product.title }}{% endraw %}" — no json filter. The smart quote breaks the JSON. Visible page is fine; AI agent parser dies. The | json filter would have escaped the smart quote correctly. Fix: universal | json on every string interpolation.
2. Review widgets writing unescaped review bodies into the graph
Many review apps inject a Review graph node with the actual review text concatenated as a JSON string. Reviews routinely contain quotes, apostrophes, line breaks, and emoji. The widget concatenates by hand and ships broken JSON in production. Static check: open one PDP from a product with reviews, view source, find the review widget's JSON-LD block, validate it independently. If it breaks, disable the widget's JSON-LD output (most have a toggle) and lean on your theme's core Product schema instead.
3. Trailing comma in a variant array
Theme upgrades that change variant looping logic occasionally leave a trailing comma at the end of an offers or variants array. Permissive parsers accept it; strict parsers don't. Always emit comma-as-separator with {% raw %}{% unless forloop.last %},{% endunless %}{% endraw %} rather than comma-as-terminator.
4. HTML in description fields, unescaped
Themes that interpolate {% raw %}{{ product.body_html | strip_html }}{% endraw %} get the HTML stripped (good) but don't escape the result for JSON (bad). If the stripped text contains a quote (', ") the JSON breaks. Always pipe through | strip_html | json, in that order.
5. Multiple competing Product schemas on the same page
Theme emits one Product block. Page-builder app emits another. They claim different prices, different brands, different aggregateRating values. Each is individually valid; together they create ambiguity that strict graph-merging parsers reject. Pick one source of truth — usually the theme — and disable the page-builder's structured data output. Two Product schemas on one PDP is never the right answer.
See also
- The 15 signals — full reference
- Product JSON-LD on PDPs (the parent block this signal validates)
- AggregateRating (the review-widget surface that's the most common source of validity bugs)
- Offers availability (the second JSON-LD field where strict-parse rules matter most)
- The full 18-signal Agentic Storefronts checklist
- Leaderboard: 100 DTC stores scored on JSON-LD validity and 14 other signals
Does every JSON-LD block on your PDP actually parse?
Free 2-minute scan. We extract every <script type="application/ld+json"> block, run them through a strict parser, and report the byte offset of any error so you can find the bad character without grepping templates.