CatalogScan

SEO Guide · 2026

Shopify Product Feed JSON Format: What AI Agents Actually Read

Every Shopify store exposes its full product catalog at /products.json — no authentication required. AI shopping agents, comparison tools, and catalog scanners read this endpoint directly. Understanding its structure tells you exactly what machine-readable data your store is (and isn't) publishing.

TL;DR /products.json contains title, vendor, body_html, tags, variants (price, barcode, sku), and images. It does not contain google_product_category, condition, MPN, or material — those require metafields read via the Storefront or Admin API. GTIN is present only if the merchant entered a barcode value per variant.

The /products.json endpoint

Shopify stores publish their product catalog at https://yourstore.com/products.json. This is a public, unauthenticated endpoint — no API keys or tokens required. It returns a JSON array of product objects, up to 250 per page:

https://yourstore.com/products.json          # first 30 products (default)
https://yourstore.com/products.json?limit=250  # up to 250 per page
https://yourstore.com/products.json?limit=250&page=2  # second page

The endpoint is not listed in robots.txt by default, meaning crawlers and AI agents can access it freely. CatalogScan reads this endpoint during the free scan to assess your catalog's machine-readable quality — before touching any Storefront API.

Top-level product object fields

Each product in the JSON array contains these fields. The "AI agent value" column explains how AI shopping agents use each one:

FieldTypeAI agent valueSignal
handlestringConstructs the canonical URL: /products/{handle}High
titlestringPrimary product name used in query matchingHigh
body_htmlstringDescription — agents strip HTML tags and parse plain text for attribute signals (material, dimensions, use case, compatibility)High
vendorstringBrand name — used for entity matching; AI agents try to resolve this string against known brand entitiesHigh
product_typestringSecondary category signal — agent uses this when google_product_category is absentMedium
tagsarrayAttribute signals for agent faceting — size, color, material tags are parsedMedium
published_atdatetimeFreshness signal — recently published products receive a brief ranking boost in AI agent resultsLow
imagesarrayImage URLs + alt text — alt text is read as a signal-rich caption; missing alt text is a moderate quality penaltyMedium

Variant-level fields (the most important object)

Each product has a variants array. Each variant is a purchasable SKU. AI shopping agents evaluate products at the variant level for price, availability, and identifier data:

Variant fieldAI agent valueSignal
priceString (!) — "29.99". Agents parse this as a decimal for price comparison. Must be a real number, not empty stringHigh
compare_at_priceOriginal/sale indicator — agents use this for sale framing in AI responsesLow
barcodeGTIN/UPC/EAN — the most critical identifier for cross-catalog deduplication and Shopping Graph inclusion. Null if merchant hasn't entered itHigh
skuMerchant-assigned SKU — used as fallback identifier when barcode is null; not globally uniqueMedium
availableBoolean — out-of-stock variants (false) are filtered from AI shopping resultsHigh
option1, option2, option3Variant attributes (Color, Size, Material) — agent uses these for faceted queries ("blue", "XL", "cotton")Medium
weightShipping weight — agents with fulfillment awareness use this for shipping time/cost estimatesLow

What's missing from the default products.json

The public /products.json endpoint does not include several fields that AI agents need for high-confidence product matching. These fields require the Storefront API or Admin API to retrieve:

Missing fieldWhy it mattersHow to add it
google_product_category Without a Google taxonomy ID, AI agents use product_type as a loose substitute — reducing match precision for category-based queries Add metafield google.google_product_category via the Google & YouTube Sales Channel app or Admin API bulk write
condition New/used/refurbished distinction — critical for Google Shopping inclusion in "buy [item] used" queries Metafield google.condition with value "new", "used", or "refurbished"
mpn (Manufacturer Part Number) Secondary identifier when GTIN is unavailable — used for B2B and parts/accessories cross-referencing Custom metafield product.mpn; render in Product JSON-LD as "mpn"
age_group / gender Required for apparel in Google Shopping — without these, apparel products receive lower relevance scores for gendered queries Metafields google.age_group, google.gender via Google & YouTube app
AggregateRating Review count + average rating — not in products.json at all; only available via JSON-LD emitted by review apps on the product page Install a review app that emits AggregateRating JSON-LD (Okendo, Judge.me Awesome tier, Loox, Fera)

The GTIN coverage problem

CatalogScan's data shows that across scanned Shopify stores, the average GTIN coverage (percentage of variants with a non-null barcode) is 41%. This means more than half of all variants are invisible to AI shopping agents that rely on GTINs for deduplication and Shopping Graph inclusion. Stores that manually clean their GTIN coverage to 90%+ see an average +14 points on the CatalogScan AI readiness score.

Reading products.json: pagination and structure

The full catalog requires pagination. A Shopify store with 1,200 products requires at minimum 5 requests at 250 per page:

// Paginate through a full catalog:
// Page 1: /products.json?limit=250&page=1
// Page 2: /products.json?limit=250&page=2
// ...continue until you receive an empty products array

// Check total count first (via count endpoint):
// /products/count.json  → {"count": 1247}
// Divide by 250 and round up to get required page count

The top-level structure is a single key products containing the array:

{
  "products": [
    {
      "id": 7234567890123,
      "title": "Merino Wool Base Layer",
      "handle": "merino-wool-base-layer",
      "body_html": "<p>400gsm merino wool...</p>",
      "vendor": "Icebreaker",
      "product_type": "Base Layer",
      "tags": ["merino", "wool", "base-layer", "150gsm"],
      "variants": [
        {
          "id": 4123456789012,
          "title": "S / Blue",
          "price": "89.95",
          "sku": "ICE-ML-S-BLU",
          "barcode": "9420025582541",
          "available": true,
          "option1": "S",
          "option2": "Blue"
        }
      ],
      "images": [
        {
          "src": "https://cdn.shopify.com/.../merino-blue-s.jpg",
          "alt": "Icebreaker Merino Wool Base Layer in Blue, size S"
        }
      ]
    }
  ]
}

FAQ

What is the URL of the Shopify product feed JSON?

The Shopify product feed JSON is at /products.json on any Shopify store. For example, https://yourstore.com/products.json. Paginate with ?page=2&limit=250 to retrieve up to 250 products per page. The default is 30 products per page if no limit is specified. Maximum page size is 250.

Does the Shopify products.json include GTIN / barcode data?

Yes, but only if the merchant has entered a barcode value for each variant in the Shopify admin. The barcode field appears inside each variant object. If a merchant has not entered a barcode, the field is null. GTINs are not auto-populated — they require manual entry or bulk import via CSV or the Admin API.

Why is google_product_category missing from products.json?

google_product_category is not part of Shopify's default product data model. It must be stored as a product metafield (namespace: google, key: google_product_category) and is surfaced via the Storefront API or Admin API — not via the public /products.json endpoint. The Google & YouTube Sales Channel app writes this metafield when you assign categories in Google Merchant Center.

Can AI shopping agents read the Shopify products.json feed directly?

Yes. /products.json is publicly accessible with no authentication and is crawlable by any bot not blocked in robots.txt. CatalogScan reads this endpoint directly for the free scan tier. Most AI shopping agents (ChatGPT Shopping via Bing, Perplexity, Google AI Mode) index product pages via HTML and JSON-LD rather than /products.json, but the endpoint is the fastest way to programmatically assess a store's full catalog quality.

Check your product feed quality

CatalogScan reads your /products.json and scores GTIN coverage, description richness, image alt text, and 15 more signals in 2 minutes.

Run the free scan →