SEO Guide · 2026
Shopify Product Feed JSON Format: What AI Agents Actually Read
Every Shopify store exposes its full product catalog at /products.json — no authentication required. AI shopping agents, comparison tools, and catalog scanners read this endpoint directly. Understanding its structure tells you exactly what machine-readable data your store is (and isn't) publishing.
/products.json contains title, vendor, body_html, tags, variants (price, barcode, sku), and images. It does not contain google_product_category, condition, MPN, or material — those require metafields read via the Storefront or Admin API. GTIN is present only if the merchant entered a barcode value per variant.
The /products.json endpoint
Shopify stores publish their product catalog at https://yourstore.com/products.json. This is a public, unauthenticated endpoint — no API keys or tokens required. It returns a JSON array of product objects, up to 250 per page:
https://yourstore.com/products.json # first 30 products (default) https://yourstore.com/products.json?limit=250 # up to 250 per page https://yourstore.com/products.json?limit=250&page=2 # second page
The endpoint is not listed in robots.txt by default, meaning crawlers and AI agents can access it freely. CatalogScan reads this endpoint during the free scan to assess your catalog's machine-readable quality — before touching any Storefront API.
Top-level product object fields
Each product in the JSON array contains these fields. The "AI agent value" column explains how AI shopping agents use each one:
| Field | Type | AI agent value | Signal |
|---|---|---|---|
handle | string | Constructs the canonical URL: /products/{handle} | High |
title | string | Primary product name used in query matching | High |
body_html | string | Description — agents strip HTML tags and parse plain text for attribute signals (material, dimensions, use case, compatibility) | High |
vendor | string | Brand name — used for entity matching; AI agents try to resolve this string against known brand entities | High |
product_type | string | Secondary category signal — agent uses this when google_product_category is absent | Medium |
tags | array | Attribute signals for agent faceting — size, color, material tags are parsed | Medium |
published_at | datetime | Freshness signal — recently published products receive a brief ranking boost in AI agent results | Low |
images | array | Image URLs + alt text — alt text is read as a signal-rich caption; missing alt text is a moderate quality penalty | Medium |
Variant-level fields (the most important object)
Each product has a variants array. Each variant is a purchasable SKU. AI shopping agents evaluate products at the variant level for price, availability, and identifier data:
| Variant field | AI agent value | Signal |
|---|---|---|
price | String (!) — "29.99". Agents parse this as a decimal for price comparison. Must be a real number, not empty string | High |
compare_at_price | Original/sale indicator — agents use this for sale framing in AI responses | Low |
barcode | GTIN/UPC/EAN — the most critical identifier for cross-catalog deduplication and Shopping Graph inclusion. Null if merchant hasn't entered it | High |
sku | Merchant-assigned SKU — used as fallback identifier when barcode is null; not globally unique | Medium |
available | Boolean — out-of-stock variants (false) are filtered from AI shopping results | High |
option1, option2, option3 | Variant attributes (Color, Size, Material) — agent uses these for faceted queries ("blue", "XL", "cotton") | Medium |
weight | Shipping weight — agents with fulfillment awareness use this for shipping time/cost estimates | Low |
What's missing from the default products.json
The public /products.json endpoint does not include several fields that AI agents need for high-confidence product matching. These fields require the Storefront API or Admin API to retrieve:
| Missing field | Why it matters | How to add it |
|---|---|---|
google_product_category |
Without a Google taxonomy ID, AI agents use product_type as a loose substitute — reducing match precision for category-based queries | Add metafield google.google_product_category via the Google & YouTube Sales Channel app or Admin API bulk write |
condition |
New/used/refurbished distinction — critical for Google Shopping inclusion in "buy [item] used" queries | Metafield google.condition with value "new", "used", or "refurbished" |
mpn (Manufacturer Part Number) |
Secondary identifier when GTIN is unavailable — used for B2B and parts/accessories cross-referencing | Custom metafield product.mpn; render in Product JSON-LD as "mpn" |
age_group / gender |
Required for apparel in Google Shopping — without these, apparel products receive lower relevance scores for gendered queries | Metafields google.age_group, google.gender via Google & YouTube app |
| AggregateRating | Review count + average rating — not in products.json at all; only available via JSON-LD emitted by review apps on the product page | Install a review app that emits AggregateRating JSON-LD (Okendo, Judge.me Awesome tier, Loox, Fera) |
The GTIN coverage problem
CatalogScan's data shows that across scanned Shopify stores, the average GTIN coverage (percentage of variants with a non-null barcode) is 41%. This means more than half of all variants are invisible to AI shopping agents that rely on GTINs for deduplication and Shopping Graph inclusion. Stores that manually clean their GTIN coverage to 90%+ see an average +14 points on the CatalogScan AI readiness score.
Reading products.json: pagination and structure
The full catalog requires pagination. A Shopify store with 1,200 products requires at minimum 5 requests at 250 per page:
// Paginate through a full catalog:
// Page 1: /products.json?limit=250&page=1
// Page 2: /products.json?limit=250&page=2
// ...continue until you receive an empty products array
// Check total count first (via count endpoint):
// /products/count.json → {"count": 1247}
// Divide by 250 and round up to get required page count
The top-level structure is a single key products containing the array:
{
"products": [
{
"id": 7234567890123,
"title": "Merino Wool Base Layer",
"handle": "merino-wool-base-layer",
"body_html": "<p>400gsm merino wool...</p>",
"vendor": "Icebreaker",
"product_type": "Base Layer",
"tags": ["merino", "wool", "base-layer", "150gsm"],
"variants": [
{
"id": 4123456789012,
"title": "S / Blue",
"price": "89.95",
"sku": "ICE-ML-S-BLU",
"barcode": "9420025582541",
"available": true,
"option1": "S",
"option2": "Blue"
}
],
"images": [
{
"src": "https://cdn.shopify.com/.../merino-blue-s.jpg",
"alt": "Icebreaker Merino Wool Base Layer in Blue, size S"
}
]
}
]
}
FAQ
What is the URL of the Shopify product feed JSON?
The Shopify product feed JSON is at /products.json on any Shopify store. For example, https://yourstore.com/products.json. Paginate with ?page=2&limit=250 to retrieve up to 250 products per page. The default is 30 products per page if no limit is specified. Maximum page size is 250.
Does the Shopify products.json include GTIN / barcode data?
Yes, but only if the merchant has entered a barcode value for each variant in the Shopify admin. The barcode field appears inside each variant object. If a merchant has not entered a barcode, the field is null. GTINs are not auto-populated — they require manual entry or bulk import via CSV or the Admin API.
Why is google_product_category missing from products.json?
google_product_category is not part of Shopify's default product data model. It must be stored as a product metafield (namespace: google, key: google_product_category) and is surfaced via the Storefront API or Admin API — not via the public /products.json endpoint. The Google & YouTube Sales Channel app writes this metafield when you assign categories in Google Merchant Center.
Can AI shopping agents read the Shopify products.json feed directly?
Yes. /products.json is publicly accessible with no authentication and is crawlable by any bot not blocked in robots.txt. CatalogScan reads this endpoint directly for the free scan tier. Most AI shopping agents (ChatGPT Shopping via Bing, Perplexity, Google AI Mode) index product pages via HTML and JSON-LD rather than /products.json, but the endpoint is the fastest way to programmatically assess a store's full catalog quality.
Check your product feed quality
CatalogScan reads your /products.json and scores GTIN coverage, description richness, image alt text, and 15 more signals in 2 minutes.
Run the free scan →