Technical Implementation

Shopify Product Descriptions for AI Shopping Agents

A technical breakdown of how ChatGPT Shopping, Perplexity, Google AI Mode, and Meta AI read your Shopify product descriptions — and exactly what to fix to get quoted.

TL;DR AI shopping agents consume your product descriptions through three distinct pathways — direct HTML crawl, the /products.json API, and Google Merchant Center feeds — each stripping markup differently. Descriptions under 150 words are rarely cited. The JSON-LD description field must match your visible body_html or agents apply a trust penalty. Use Liquid's strip_html | strip_newlines | truncatewords: 500 to generate a clean JSON-LD description automatically.

How AI Agents Read Shopify Product Descriptions: Three Pathways

Before optimizing your descriptions, you need to understand what each AI shopping agent actually reads. There are three distinct pathways, and content visible in one may be invisible in another.

Pathway 1 — Direct HTML crawl

Agents like Perplexity's PerplexityBot and OpenAI's OAI-SearchBot crawl the rendered HTML of your product pages and extract text from the DOM. This pathway reads content that is present in the server-rendered HTML at page load. Content injected by JavaScript after DOM ready — including most Shopify app review widgets, upsell content, and dynamically loaded metafield blocks — is not reliably captured unless the crawler executes JavaScript, which most do not for product-level crawls.

Shopify's native body_html field is always in the server-rendered output. Product description apps that rewrite the DOM client-side are invisible to most crawlers.

Pathway 2 — /products.json API

Shopify exposes every store's product catalog at https://yourstore.com/products.json (paginated with ?page=N&limit=250). This endpoint returns the body_html field with all HTML tags intact. AI systems that consume this feed then strip tags themselves using their own HTML parser. The result: any content not stored in Shopify's native body_html column — content from third-party apps, metafield rendering, or theme injection — does not appear in this pathway at all.

Special HTML entities such as &amp;, &mdash;, and &trade; survive this stripping process as their Unicode characters. Decorative whitespace tags like <br> and <hr> collapse to spaces. Nested <div> wrappers add zero content value.

Pathway 3 — Google Merchant Center feed

Google AI Mode sources product data primarily through the Merchant Center feed, where the description attribute corresponds to your Shopify product description but with a hard 5,000-character cap. Shopify's Google & YouTube channel exports this field automatically, but it uses the raw body_html with minimal processing. HTML tags that survive into the feed description are flagged as feed errors. If you rely on heavily HTML-tagged descriptions (tables, divs, custom classes), the exported feed description may be significantly shorter than the actual visible text because tag overhead consumes the character budget.

Word-Count Thresholds That Determine AI Citation Rate

AI shopping agents calibrate their confidence in a product description based on its length. Below are the five operative tiers observed across ChatGPT Shopping, Perplexity, and Google AI Mode citation patterns.

Word count (stripped) AI agent treatment Citation likelihood
Under 50 words Treated as stub content; description usually omitted from agent response Very low
50–149 words Borderline; agents may paraphrase but rarely quote directly Low
150–299 words Baseline threshold; agents extract one to two key claims Moderate
300–500 words Optimal range; agents extract multiple signals, frequently quoted High
500+ words Maximum coverage; agents may truncate but description is fully indexed High

These counts refer to visible text after HTML stripping, not raw body_html character length. A description with 600 characters of HTML tags and 80 words of actual text falls in the "50–149 words" tier, not the "500+" tier.

Description Source vs. AI Agent: What Each Platform Reads

AI Agent Primary source Secondary source HTML stripping
ChatGPT Shopping Product JSON-LD description Bing Shopping index, /products.json Full strip before indexing
Perplexity Shopping Rendered HTML body (direct crawl) Product JSON-LD description DOM text extraction
Google AI Mode Merchant Center feed description Product JSON-LD on product page Feed validator strips tags
Meta AI (Instagram/FB Shopping) Meta Commerce catalog feed Open Graph og:description, body HTML Tag strip + truncate to 5,000 chars

The body_html Technical Requirements

Shopify's body_html field is the canonical source for all three pathways. Technical issues here compound across every downstream system.

JavaScript-injected content is invisible

Any content rendered by a Shopify app after page load — including dynamically inserted feature lists, fit guides, or ingredient panels — does not exist in body_html. It will not appear in /products.json and will be missed by crawlers that do not execute JavaScript. Move critical content into the native Shopify description field in the admin.

HTML tag overhead inflates character count without adding text value

A common pattern is wrapping each sentence in a <div class="desc-section"> block. This adds approximately 30 characters of tag overhead per sentence, consuming your Merchant Center feed's 5,000-character budget without contributing readable text. Use plain <p> tags, <ul>/<li> for feature lists, and <strong> only for genuinely critical terms.

Special characters survive /products.json stripping

HTML entities such as &trade; (™), &reg; (®), and &mdash; (—) are decoded to their Unicode equivalents in the JSON output. This is expected behavior. However, malformed entities like &amp;trade; or unescaped bare ampersands cause JSON parse errors in some downstream consumers. Always use well-formed HTML entities or their Unicode characters directly.

JSON-LD Description Field: Liquid Implementation

The description property in your Product JSON-LD is the highest-confidence signal for ChatGPT Shopping and is cross-referenced against visible body text by Perplexity. It must be a plain-text string — no HTML tags, no newline characters, no unescaped quotation marks.

The following Liquid snippet produces a clean, safely JSON-encoded description from body_html. Place this inside your Product JSON-LD <script> block in your Shopify theme's product.liquid or product-template.liquid section.

{%- assign desc_clean = product.description
      | strip_html
      | strip_newlines
      | strip
      | truncatewords: 500 -%}

<script type="application/ld+json">
{
  "@context": "https://schema.org/",
  "@type": "Product",
  "name": {{ product.title | json }},
  "description": {{ desc_clean | json }},
  "sku": {{ product.selected_or_first_available_variant.sku | json }},
  "brand": {
    "@type": "Brand",
    "name": {{ product.vendor | json }}
  },
  "offers": {
    "@type": "Offer",
    "url": {{ canonical_url | json }},
    "priceCurrency": {{ cart.currency.iso_code | json }},
    "price": {{ product.selected_or_first_available_variant.price | divided_by: 100.0 | json }},
    "availability": {% if product.available %}"https://schema.org/InStock"{% else %}"https://schema.org/OutOfStock"{% endif %},
    "priceValidUntil": "{{ 'now' | date: '%Y' | plus: 1 }}-12-31"
  }
}
</script>

Key decisions in this snippet: strip_html removes all tags; strip_newlines collapses the output to a single line, avoiding JSON syntax errors from literal newline characters; strip removes leading and trailing whitespace; truncatewords: 500 keeps the field under approximately 3,500 characters, safely below the practical JSON-LD description limit. The | json filter at the end handles all necessary quote escaping.

Six Technical Signals AI Agents Extract from Descriptions

AI shopping agents do not treat descriptions as opaque text blocks. They extract structured signals using pattern matching. Including these signals explicitly — with consistent formatting — dramatically increases the probability that an agent uses your description in a product recommendation.

Signal Pattern to include Example
Material composition Percentage + material name "Made from 95% organic cotton, 5% elastane"
Dimensions / weight Number + unit (metric or imperial) "32 cm x 22 cm x 8 cm; 480 g"
Compatibility / fit Works with / fits / compatible with + named entity "Compatible with iPhone 15 Pro and 15 Pro Max"
Use-case context Verb phrase describing the primary action "Designed for trail running in wet conditions"
Certification / standard Named certification or standard "CE certified, RoHS compliant, UL listed"
Warranty / guarantee Duration + coverage statement "Backed by a 2-year manufacturer warranty"

Per-Agent Implementation Priority

Agent Highest-impact description action Secondary action
ChatGPT Shopping Add plain-text description to Product JSON-LD Ensure Bing Webmaster Tools verifies your site
Perplexity Shopping Expand body_html to 300+ stripped words Move all app-injected content into native body_html
Google AI Mode Clean up feed description (remove HTML tags from export) Keep Merchant Center feed description under 4,500 chars
Meta AI Set og:description to a 200–300 char plain-text summary Ensure body_html text appears above fold (no JS rendering)

Technical Implementation Checklist

# Check Priority
1 All top-20 products have 150+ stripped words in body_html Critical
2 Product JSON-LD includes a description field on every product page Critical
3 JSON-LD description is generated with strip_html | strip_newlines (no raw HTML) Critical
4 No product description content is JavaScript-injected (all in native body_html) Critical
5 HTML tags in body_html are limited to p, ul, li, strong, em, h3, h4 High
6 No bare ampersands or malformed HTML entities in body_html High
7 Merchant Center feed description field is under 4,500 characters High
8 At least three of the six technical signals (materials, dimensions, compatibility, use-case, certification, warranty) are present High
9 og:description is a 200–300 character plain-text summary (not truncated body_html) Medium
10 Priority products (300–500 word range) are identified and cross-linked within product collections Medium

Further Reading

Frequently Asked Questions

What word count does a Shopify product description need for AI agents to quote it?

AI shopping agents rarely quote descriptions under 50 words, treating them as stub content. The baseline threshold for consistent AI citation is 150 words. Descriptions in the 300–500 word range are quoted most frequently because they provide enough context for agents to extract multiple signals (materials, use case, compatibility, dimensions) without exceeding the context window budget agents assign to a single product.

Does the JSON-LD description field need to match the visible body_html?

Yes. When a JSON-LD description contradicts the visible body_html, AI agents apply a trust penalty and may discard both signals. The description field in your Product JSON-LD should be a clean, plain-text version of body_html — strip all HTML tags, collapse whitespace, and keep it under 5,000 characters. In Liquid, use: {{ product.description | strip_html | strip_newlines | truncatewords: 500 | json }}.

How does Shopify's /products.json endpoint affect AI agent crawling?

Shopify's /products.json endpoint exposes the body_html field with all HTML tags intact. AI crawlers that consume this API then strip the tags themselves, which means JavaScript-injected content, content rendered by apps after page load, and content inside iframes will not appear in the body_html field at all. Any content that must be discoverable by AI agents needs to be in Shopify's native body_html, not injected client-side.

Which AI shopping agent benefits most from description optimization?

Perplexity Shopping currently shows the strongest correlation between description quality and citation rate because it directly crawls product pages and renders text-layer content. ChatGPT Shopping weighs structured data completeness (JSON-LD) alongside description text. Google AI Mode primarily relies on the Google Merchant Center feed description field, which has a hard 5,000-character limit. Meta AI reads Open Graph tags and the HTML body, making both og:description and body_html relevant.

See Your Description Coverage Across Your Entire Catalog

Run a CatalogScan check on your store to see exactly which description signals are present across your catalog — word count per product, JSON-LD description presence, body_html cleanliness, and Merchant Center feed description length — all in one report.

Scan your store free