Technical Implementation
Shopify Product Descriptions for AI Shopping Agents
A technical breakdown of how ChatGPT Shopping, Perplexity, Google AI Mode, and Meta AI read your Shopify product descriptions — and exactly what to fix to get quoted.
/products.json API, and Google Merchant Center feeds — each stripping markup differently. Descriptions under 150 words are rarely cited. The JSON-LD description field must match your visible body_html or agents apply a trust penalty. Use Liquid's strip_html | strip_newlines | truncatewords: 500 to generate a clean JSON-LD description automatically.
How AI Agents Read Shopify Product Descriptions: Three Pathways
Before optimizing your descriptions, you need to understand what each AI shopping agent actually reads. There are three distinct pathways, and content visible in one may be invisible in another.
Pathway 1 — Direct HTML crawl
Agents like Perplexity's PerplexityBot and OpenAI's OAI-SearchBot crawl the rendered HTML of your product pages and extract text from the DOM. This pathway reads content that is present in the server-rendered HTML at page load. Content injected by JavaScript after DOM ready — including most Shopify app review widgets, upsell content, and dynamically loaded metafield blocks — is not reliably captured unless the crawler executes JavaScript, which most do not for product-level crawls.
Shopify's native body_html field is always in the server-rendered output. Product description apps that rewrite the DOM client-side are invisible to most crawlers.
Pathway 2 — /products.json API
Shopify exposes every store's product catalog at https://yourstore.com/products.json (paginated with ?page=N&limit=250). This endpoint returns the body_html field with all HTML tags intact. AI systems that consume this feed then strip tags themselves using their own HTML parser. The result: any content not stored in Shopify's native body_html column — content from third-party apps, metafield rendering, or theme injection — does not appear in this pathway at all.
Special HTML entities such as &, —, and ™ survive this stripping process as their Unicode characters. Decorative whitespace tags like <br> and <hr> collapse to spaces. Nested <div> wrappers add zero content value.
Pathway 3 — Google Merchant Center feed
Google AI Mode sources product data primarily through the Merchant Center feed, where the description attribute corresponds to your Shopify product description but with a hard 5,000-character cap. Shopify's Google & YouTube channel exports this field automatically, but it uses the raw body_html with minimal processing. HTML tags that survive into the feed description are flagged as feed errors. If you rely on heavily HTML-tagged descriptions (tables, divs, custom classes), the exported feed description may be significantly shorter than the actual visible text because tag overhead consumes the character budget.
Word-Count Thresholds That Determine AI Citation Rate
AI shopping agents calibrate their confidence in a product description based on its length. Below are the five operative tiers observed across ChatGPT Shopping, Perplexity, and Google AI Mode citation patterns.
| Word count (stripped) | AI agent treatment | Citation likelihood |
|---|---|---|
| Under 50 words | Treated as stub content; description usually omitted from agent response | Very low |
| 50–149 words | Borderline; agents may paraphrase but rarely quote directly | Low |
| 150–299 words | Baseline threshold; agents extract one to two key claims | Moderate |
| 300–500 words | Optimal range; agents extract multiple signals, frequently quoted | High |
| 500+ words | Maximum coverage; agents may truncate but description is fully indexed | High |
These counts refer to visible text after HTML stripping, not raw body_html character length. A description with 600 characters of HTML tags and 80 words of actual text falls in the "50–149 words" tier, not the "500+" tier.
Description Source vs. AI Agent: What Each Platform Reads
| AI Agent | Primary source | Secondary source | HTML stripping |
|---|---|---|---|
| ChatGPT Shopping | Product JSON-LD description |
Bing Shopping index, /products.json | Full strip before indexing |
| Perplexity Shopping | Rendered HTML body (direct crawl) | Product JSON-LD description |
DOM text extraction |
| Google AI Mode | Merchant Center feed description |
Product JSON-LD on product page | Feed validator strips tags |
| Meta AI (Instagram/FB Shopping) | Meta Commerce catalog feed | Open Graph og:description, body HTML |
Tag strip + truncate to 5,000 chars |
The body_html Technical Requirements
Shopify's body_html field is the canonical source for all three pathways. Technical issues here compound across every downstream system.
JavaScript-injected content is invisible
Any content rendered by a Shopify app after page load — including dynamically inserted feature lists, fit guides, or ingredient panels — does not exist in body_html. It will not appear in /products.json and will be missed by crawlers that do not execute JavaScript. Move critical content into the native Shopify description field in the admin.
HTML tag overhead inflates character count without adding text value
A common pattern is wrapping each sentence in a <div class="desc-section"> block. This adds approximately 30 characters of tag overhead per sentence, consuming your Merchant Center feed's 5,000-character budget without contributing readable text. Use plain <p> tags, <ul>/<li> for feature lists, and <strong> only for genuinely critical terms.
Special characters survive /products.json stripping
HTML entities such as ™ (™), ® (®), and — (—) are decoded to their Unicode equivalents in the JSON output. This is expected behavior. However, malformed entities like &trade; or unescaped bare ampersands cause JSON parse errors in some downstream consumers. Always use well-formed HTML entities or their Unicode characters directly.
JSON-LD Description Field: Liquid Implementation
The description property in your Product JSON-LD is the highest-confidence signal for ChatGPT Shopping and is cross-referenced against visible body text by Perplexity. It must be a plain-text string — no HTML tags, no newline characters, no unescaped quotation marks.
The following Liquid snippet produces a clean, safely JSON-encoded description from body_html. Place this inside your Product JSON-LD <script> block in your Shopify theme's product.liquid or product-template.liquid section.
{%- assign desc_clean = product.description
| strip_html
| strip_newlines
| strip
| truncatewords: 500 -%}
<script type="application/ld+json">
{
"@context": "https://schema.org/",
"@type": "Product",
"name": {{ product.title | json }},
"description": {{ desc_clean | json }},
"sku": {{ product.selected_or_first_available_variant.sku | json }},
"brand": {
"@type": "Brand",
"name": {{ product.vendor | json }}
},
"offers": {
"@type": "Offer",
"url": {{ canonical_url | json }},
"priceCurrency": {{ cart.currency.iso_code | json }},
"price": {{ product.selected_or_first_available_variant.price | divided_by: 100.0 | json }},
"availability": {% if product.available %}"https://schema.org/InStock"{% else %}"https://schema.org/OutOfStock"{% endif %},
"priceValidUntil": "{{ 'now' | date: '%Y' | plus: 1 }}-12-31"
}
}
</script>
Key decisions in this snippet: strip_html removes all tags; strip_newlines collapses the output to a single line, avoiding JSON syntax errors from literal newline characters; strip removes leading and trailing whitespace; truncatewords: 500 keeps the field under approximately 3,500 characters, safely below the practical JSON-LD description limit. The | json filter at the end handles all necessary quote escaping.
Six Technical Signals AI Agents Extract from Descriptions
AI shopping agents do not treat descriptions as opaque text blocks. They extract structured signals using pattern matching. Including these signals explicitly — with consistent formatting — dramatically increases the probability that an agent uses your description in a product recommendation.
| Signal | Pattern to include | Example |
|---|---|---|
| Material composition | Percentage + material name | "Made from 95% organic cotton, 5% elastane" |
| Dimensions / weight | Number + unit (metric or imperial) | "32 cm x 22 cm x 8 cm; 480 g" |
| Compatibility / fit | Works with / fits / compatible with + named entity | "Compatible with iPhone 15 Pro and 15 Pro Max" |
| Use-case context | Verb phrase describing the primary action | "Designed for trail running in wet conditions" |
| Certification / standard | Named certification or standard | "CE certified, RoHS compliant, UL listed" |
| Warranty / guarantee | Duration + coverage statement | "Backed by a 2-year manufacturer warranty" |
Per-Agent Implementation Priority
| Agent | Highest-impact description action | Secondary action |
|---|---|---|
| ChatGPT Shopping | Add plain-text description to Product JSON-LD |
Ensure Bing Webmaster Tools verifies your site |
| Perplexity Shopping | Expand body_html to 300+ stripped words | Move all app-injected content into native body_html |
| Google AI Mode | Clean up feed description (remove HTML tags from export) | Keep Merchant Center feed description under 4,500 chars |
| Meta AI | Set og:description to a 200–300 char plain-text summary |
Ensure body_html text appears above fold (no JS rendering) |
Technical Implementation Checklist
| # | Check | Priority |
|---|---|---|
| 1 | All top-20 products have 150+ stripped words in body_html | Critical |
| 2 | Product JSON-LD includes a description field on every product page |
Critical |
| 3 | JSON-LD description is generated with strip_html | strip_newlines (no raw HTML) |
Critical |
| 4 | No product description content is JavaScript-injected (all in native body_html) | Critical |
| 5 | HTML tags in body_html are limited to p, ul, li, strong, em, h3, h4 | High |
| 6 | No bare ampersands or malformed HTML entities in body_html | High |
| 7 | Merchant Center feed description field is under 4,500 characters | High |
| 8 | At least three of the six technical signals (materials, dimensions, compatibility, use-case, certification, warranty) are present | High |
| 9 | og:description is a 200–300 character plain-text summary (not truncated body_html) |
Medium |
| 10 | Priority products (300–500 word range) are identified and cross-linked within product collections | Medium |
Further Reading
- Shopify Product Descriptions for AI Agents: Strategy Guide — the companion blog post covering content strategy and copywriting approach.
- Shopify XML Product Feed Format for AI Agents — how feed format affects downstream AI Shopping ingestion.
- AI Product Description Generator for Shopify — tools and prompts for scaling description quality across large catalogs.
- AI Shopping Agent Product Ranking Factors — the full ranking signal matrix across ChatGPT, Perplexity, and Google AI Mode.
Frequently Asked Questions
What word count does a Shopify product description need for AI agents to quote it?
AI shopping agents rarely quote descriptions under 50 words, treating them as stub content. The baseline threshold for consistent AI citation is 150 words. Descriptions in the 300–500 word range are quoted most frequently because they provide enough context for agents to extract multiple signals (materials, use case, compatibility, dimensions) without exceeding the context window budget agents assign to a single product.
Does the JSON-LD description field need to match the visible body_html?
Yes. When a JSON-LD description contradicts the visible body_html, AI agents apply a trust penalty and may discard both signals. The description field in your Product JSON-LD should be a clean, plain-text version of body_html — strip all HTML tags, collapse whitespace, and keep it under 5,000 characters. In Liquid, use: {{ product.description | strip_html | strip_newlines | truncatewords: 500 | json }}.
How does Shopify's /products.json endpoint affect AI agent crawling?
Shopify's /products.json endpoint exposes the body_html field with all HTML tags intact. AI crawlers that consume this API then strip the tags themselves, which means JavaScript-injected content, content rendered by apps after page load, and content inside iframes will not appear in the body_html field at all. Any content that must be discoverable by AI agents needs to be in Shopify's native body_html, not injected client-side.
Which AI shopping agent benefits most from description optimization?
Perplexity Shopping currently shows the strongest correlation between description quality and citation rate because it directly crawls product pages and renders text-layer content. ChatGPT Shopping weighs structured data completeness (JSON-LD) alongside description text. Google AI Mode primarily relies on the Google Merchant Center feed description field, which has a hard 5,000-character limit. Meta AI reads Open Graph tags and the HTML body, making both og:description and body_html relevant.
See Your Description Coverage Across Your Entire Catalog
Run a CatalogScan check on your store to see exactly which description signals are present across your catalog — word count per product, JSON-LD description presence, body_html cleanliness, and Merchant Center feed description length — all in one report.
Scan your store free