HomeBlog › Cloudflare for Shopify: AI shopping agents

Cloudflare for Shopify: the three settings that silently block AI shopping agents

Your robots.txt says Allow: / for GPTBot. Your Shopify theme hasn’t been touched in months. And yet CatalogScan shows your robots-open signal failing at 0 points. The culprit is almost certainly Cloudflare — and it’s almost certainly one of three specific settings. This guide shows you which one, how to find it in the Cloudflare dashboard, and how to fix it without making your store an open target for real scrapers.

Published 2026-05-30 · ~14 min read · By the CatalogScan team

TL;DR: Three Cloudflare settings break AI shopping crawler access: (1) Bot Fight Mode — enabled by default on many free zones, blocks every AI crawler via JS challenge; (2) AI Scrapers and Crawlers managed rule — Cloudflare’s own “block AI bots” toggle under Security → WAF; (3) custom WAF rules using cf.client.bot — blocks all bots Cloudflare recognises, including GPTBot and OAI-SearchBot. Run the curl test below first to confirm Cloudflare is the problem, then work through the three settings in order.
~30%
Of failing robots-open scans trace to a Cloudflare block, not the robots.txt text
403
or a JS challenge page — what the AI crawler sees instead of your robots.txt
3
Cloudflare settings to check, in order

In this guide

  1. Step 0 — confirm Cloudflare is the problem (the curl test)
  2. Why Cloudflare can pass your robots.txt text and still fail your store
  3. Setting 1 — Bot Fight Mode
  4. Setting 2 — AI Scrapers and Crawlers managed rule
  5. Setting 3 — custom WAF rules with cf.client.bot
  6. The safe Cloudflare config: what to keep, what to remove
  7. 5 mistakes that make this harder than it needs to be
  8. Verification playbook

Step 0 — confirm Cloudflare is the problem

Before touching any Cloudflare setting, run this curl command from your terminal. It impersonates GPTBot, OpenAI’s combined training-and-shopping crawler, and fetches your robots.txt:

curl -si \
  -A "GPTBot/1.0 (+https://openai.com/gptbot)" \
  https://yourstore.com/robots.txt | head -30

Replace yourstore.com with your actual domain. Look at the first 30 lines of the response:

Run the same test with the other major AI shopping crawlers to understand the full scope of the problem:

# OAI-SearchBot: the ChatGPT shopping retrieval crawler (separate from GPTBot)
curl -si -A "OAI-SearchBot/1.0 (+https://openai.com/searchbot)" \
  https://yourstore.com/robots.txt | head -5

# PerplexityBot: Perplexity Shopping
curl -si -A "PerplexityBot/1.0 (+https://docs.perplexity.ai/guides/bots)" \
  https://yourstore.com/robots.txt | head -5

# Google-Extended: Google AI Mode and AI Overviews
curl -si -A "Googlebot/2.1 (+http://www.google.com/bot.html)" \
  -H "X-Forwarded-For: 66.249.66.1" \
  https://yourstore.com/robots.txt | head -5

# ClaudeBot: Anthropic AI training and Claude features
curl -si -A "ClaudeBot/0.5 (+https://www.anthropic.com/claude-web-crawler)" \
  https://yourstore.com/robots.txt | head -5

It’s common for one crawler to get through while another is blocked. GPTBot and OAI-SearchBot come from different IP ranges and are classified differently by Cloudflare’s bot intelligence, so a rule that lets GPTBot through can still block OAI-SearchBot. Test each one.

Why Cloudflare can pass your robots.txt text and still fail your store

Cloudflare operates at the edge, between the internet and your Shopify store. It sees every request before Shopify does. If Cloudflare decides a request looks bot-like, it can intercept it and return a challenge page (HTTP 200 with JavaScript) or a hard block (HTTP 403) — without ever asking your Shopify origin what to do.

This creates a gap that trips up most Shopify operators who check their robots.txt: they read the file in their browser, it looks fine, they assume all crawlers can see it. But AI crawlers are fetching via curl-like HTTP clients, not a browser, and Cloudflare’s bot detection looks at the user agent, TLS fingerprint, and connection behaviour to classify the request. A GPTBot fetch looks very different from a Chrome browser fetch, and Cloudflare classifies it accordingly.

The stealth-failure pattern: Cloudflare returns HTTP 200 with a JavaScript challenge page. Your robots.txt text is fine. The CatalogScan signal check — which fetches /robots.txt with each major AI-agent UA and reads the body — sees the challenge HTML, not a robots.txt rule set, and fails the signal. The store thinks it’s open because the robots.txt text says so. It’s actually closed at the CDN layer.

The four AI crawlers that matter for shopping surfaces right now, and what they do with your catalog:

GPTBot

ChatGPT training + ChatGPT Shopping product indexing

GPTBot/1.0 (+https://openai.com/gptbot)

OAI-SearchBot

ChatGPT real-time product retrieval (Bing-sourced)

OAI-SearchBot/1.0 (+https://openai.com/searchbot)

PerplexityBot

Perplexity Shopping product cards

PerplexityBot/1.0 (+https://docs.perplexity.ai/guides/bots)

Google-Extended

Google AI Mode, AI Overviews, Google Shopping AI features

Googlebot/2.1 (+http://www.google.com/bot.html)

If any of these return anything other than your actual robots.txt content, they can’t proceed to index your catalog, read your JSON-LD, or include your products in AI-generated shopping results. The robots-open signal is the floor signal: at 0 points, every other signal becomes irrelevant because the crawler never reaches your product pages.

Setting 1 — Bot Fight Mode

Bot Fight Mode Security → Bots → Bot Fight Mode HIGH RISK

Dashboard path: Cloudflare → select your zone → Security → Bots

Bot Fight Mode is Cloudflare’s first-generation bot mitigation. When enabled, it serves a JavaScript challenge (or sometimes a CAPTCHA) to any request whose user agent and TLS fingerprint match Cloudflare’s “bot” classification. Every AI shopping crawler passes this test: they send a clear user agent string (GPTBot/1.0), they don’t run JavaScript, and they don’t solve CAPTCHAs. The challenge response (HTTP 200, HTML body) looks like a successful request to a naive observer but contains no robots.txt content.

Bot Fight Mode is available on every Cloudflare plan including Free. It defaults to on for many legacy free zones provisioned before 2024, and defaults to off for newer zones — but it’s a single toggle and many operators enable it during a security scare and forget it’s on.

How to diagnose it

The curl test is your fastest check: if you see <!DOCTYPE html> or “Just a moment” in the response to a GPTBot user agent curl, Bot Fight Mode is the culprit. Alternatively, in the Cloudflare dashboard, navigate to Security → Bots. You’ll see a toggle labelled “Bot Fight Mode.” If it’s on and you don’t have a Skip rule for AI crawlers, every AI shopping crawler is being challenged.

Fix option A: disable Bot Fight Mode entirely

If your store sells to end consumers (not a B2B API with sensitive data), disabling Bot Fight Mode is safe and simple. The toggle is a blunt instrument for many legitimate use cases — it blocks Googlebot on misconfigured zones, price-comparison crawlers that drive legitimate traffic, and AI shopping crawlers that represent growing retail revenue. Turn it off, run the curl test again, and verify you get robots.txt content back.

Fix option B: create a Skip rule for AI crawlers (recommended if you must keep Bot Fight Mode)

If your store has a real DDoS problem or you sell high-value inventory that attracts scrapers, you may want to keep Bot Fight Mode for genuine bad actors. Cloudflare lets you create “Skip” WAF rules that bypass Bot Fight Mode for specific conditions. Navigate to Security → WAF → Custom Rules and create a rule with:

Action: Skip
Skip: All Bot Fight Mode features
When: (http.user_agent contains "GPTBot") or
      (http.user_agent contains "OAI-SearchBot") or
      (http.user_agent contains "PerplexityBot") or
      (http.user_agent contains "ClaudeBot") or
      (http.user_agent contains "Google-Extended") or
      (http.user_agent contains "Applebot-Extended")

Place this rule above any block rules in the Custom Rules list. Order matters: Cloudflare evaluates rules top-to-bottom and stops at the first match. After saving, run the curl test — you should now get the robots.txt content back for each of these agents.

Super Bot Fight Mode (Pro plan and above)

Pro, Business, and Enterprise zones have a more granular version called Super Bot Fight Mode. It classifies bots into “Definitely automated,” “Likely automated,” and “Verified bots.” AI shopping crawlers (GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot) are generally classified as “Verified bots” by Cloudflare since they include their documentation URL in the user agent string. If your Super Bot Fight Mode action for “Verified bots” is “Allow,” you’re fine. If it’s “Challenge” or “Block,” that’s the issue — change it to “Allow.”

Setting 2 — AI Scrapers and Crawlers managed rule

AI Scrapers and Crawlers managed rule Security → WAF → Managed Rules MED RISK

Dashboard path: Cloudflare → select your zone → Security → WAF → Managed Rules

In 2024, Cloudflare added a managed rule specifically labelled “AI Scrapers and Crawlers” to its WAF managed ruleset. When enabled, it blocks a curated list of known AI crawlers by user agent — including GPTBot, ClaudeBot, CCBot, Bytedance-UA, and anthropic-ai. The intent was to give operators an easy way to opt out of AI training data collection. The unintended side effect is that it also blocks the same crawlers when they’re fetching for AI shopping retrieval, not training.

This rule defaults to off for new Cloudflare zones but may have been toggled on deliberately (there was a wave of “how to block AI scrapers” tutorials in 2023–2024) or it may be on in older zone migrations. Unlike Bot Fight Mode, this rule is specifically targeting AI bots by name, so it’s more surgical — it won’t block Googlebot — but it will block every AI shopping crawler you care about.

How to diagnose it

Navigate to Security → WAF → Managed Rules in your Cloudflare dashboard. Look for a rule with “AI Scrapers” or “AI Crawlers” in the name. If its action is “Block” or “Challenge,” it’s the problem. You can also check Security → Events (formerly Firewall Events) and filter by the rule ID — if you see GPTBot requests being blocked with a managed rule match, that’s it.

Fix: disable the managed rule

Click the rule and set its action to “Disabled” or delete it. If you want to keep blocking AI training crawlers (CCBot, the Common Crawl crawler used by many LLM training pipelines) without blocking AI shopping crawlers, Cloudflare doesn’t currently offer a single toggle that makes that distinction. Your options are:

You cannot use a robots.txt Disallow rule to block CCBot while keeping the managed rule on. The managed rule acts at the network layer before robots.txt is ever served, and it doesn’t distinguish between training crawlers and shopping retrieval crawlers.

The “training vs. shopping” decision is now yours to make deliberately

The managed rule was created when the only use case for GPTBot was training ChatGPT. That’s no longer true. GPTBot now also drives ChatGPT Shopping product indexing. Blocking GPTBot opts you out of both. Most Shopify operators who enabled this rule in 2023 or 2024 made a deliberate decision about LLM training but an accidental decision about AI shopping revenue. Review it now with both use cases in mind.

Setting 3 — custom WAF rules using cf.client.bot

Custom WAF rule: cf.client.bot Security → WAF → Custom Rules HIGH RISK

Dashboard path: Cloudflare → select your zone → Security → WAF → Custom Rules

The Cloudflare field cf.client.bot is a boolean that evaluates to true for any request Cloudflare recognises as coming from a known bot — including AI shopping crawlers. A very common custom WAF rule pattern copied from security guides is (cf.client.bot) with action “Block” or “Challenge.” This rule blocks all bots Cloudflare can identify, including Googlebot, Bingbot, GPTBot, OAI-SearchBot, and every AI shopping crawler. It’s the widest possible net and the most common source of hard-to-diagnose robot blocks.

How to diagnose it

Navigate to Security → WAF → Custom Rules. Read through your active rules. Any rule with cf.client.bot in the expression and a “Block,” “Challenge,” or “Managed Challenge” action will block AI shopping crawlers. Also check for rules using cf.bot_management.score with a threshold you don’t control (bot management score is a 0–100 scale; a threshold of 30 blocks everything AI-crawler-shaped).

Also look for rules that are “allow except Googlebot” shaped — a common pattern from 2022–2023 security tutorials:

# This rule blocks all bots except Googlebot — including every AI shopping crawler
(cf.client.bot and not http.user_agent contains "Googlebot")

If you find a rule like this, you’re allowing Google’s traditional web crawler (which was the SEO gold standard in 2022) but blocking every AI shopping crawler that didn’t exist when the rule was written.

Fix: rewrite the rule to allow AI shopping crawlers explicitly

Two options depending on whether you want to keep a bot-blocking rule at all:

Option A — delete the cf.client.bot rule entirely and rely on Bot Fight Mode (with a Skip rule for verified bots) for bot mitigation. This is cleaner and lets Cloudflare’s own bot intelligence do the work instead of a blanket boolean.

Option B — add AI shopping crawlers to the allowlist in your existing rule:

# Before (blocks all bots Cloudflare identifies, including GPTBot):
(cf.client.bot)

# After (blocks unverified bots, allows known AI shopping crawlers):
(cf.client.bot and
  not (http.user_agent contains "GPTBot") and
  not (http.user_agent contains "OAI-SearchBot") and
  not (http.user_agent contains "PerplexityBot") and
  not (http.user_agent contains "ClaudeBot") and
  not (http.user_agent contains "Google-Extended") and
  not (http.user_agent contains "Applebot-Extended"))

Be aware that cf.client.bot checks a Cloudflare-maintained list of known bots. Newly launched AI shopping crawlers (and new bot types from existing providers) won’t be in the list immediately, so a rule built on cf.client.bot is inherently brittle as the AI shopping crawler landscape evolves. Adding explicit user-agent allowlists for the current set of crawlers is a better long-term approach than relying on Cloudflare’s internal classification alone.

The cf.bot_management.score variant

If you have Cloudflare Bot Management (an Enterprise add-on), you may have rules using the cf.bot_management.score field. This is a 0–100 score where lower scores indicate more bot-like traffic. AI shopping crawlers typically score between 1 and 20. A rule like (cf.bot_management.score lt 30 and not cf.bot_management.verified_bot) should pass known verified bots (GPTBot, Googlebot, ClaudeBot are all in Cloudflare’s verified bot list) but block unverified low-score traffic. That’s a reasonable setup. A rule like (cf.bot_management.score lt 50) with no verified-bot exception will block most AI crawlers along with the bad actors you actually want to stop.

The safe Cloudflare config for a Shopify store that wants AI shopping visibility

Here’s the minimum Cloudflare setup that keeps your store protected against real threats while remaining fully open to AI shopping crawlers:

Setting Recommended state Why
Bot Fight Mode Off — or On with a Skip rule for AI shopping UAs Default On blocks all AI crawlers. If you keep it On, you need the Skip rule or verified bots won’t get through.
AI Scrapers and Crawlers managed rule Disabled This rule explicitly targets GPTBot, ClaudeBot, and shopping crawlers by name. Disabling is a deliberate opt-in to AI shopping visibility.
Custom WAF rules using cf.client.bot Rewrite to exclude AI shopping UAs, or delete if redundant cf.client.bot = true for GPTBot and OAI-SearchBot. A blanket block on this field kills shopping crawlers.
DDoS protection (L7) Keep on, no changes needed L7 DDoS mitigation is rate-based and threshold-based, not UA-based. It doesn’t block AI crawlers at normal crawl rates.
IP Access Rules Review any block rules for known bot IP ranges Some IP block lists include Cloudflare’s own bot IP ranges, which overlap with AI crawler egress IPs. IP blocks bypass UA-level allowlists.
Rate limiting rules Set thresholds >100 req/min for /robots.txt path AI crawlers typically hit /robots.txt once at the start of a crawl job. A rate limit as low as 5 req/min can block repeated crawl jobs. Set path-specific higher limits for known-open paths.

There’s one more layer that’s easy to miss: IP reputation challenges. Cloudflare can issue Managed Challenges (previously “non-interactive CAPTCHAs”) for requests from IP addresses with poor reputation scores. AI crawlers that come from data-center IP space (all of them do) sometimes have elevated risk scores simply because data-center IPs are overrepresented in bot traffic. If you’re seeing Managed Challenge responses for AI crawler user agents rather than hard 403s, this is the likely cause. Navigate to Security → WAF → Tools → IP Access Rules to review active challenge rules for data-center IP ranges.

5 mistakes that make this harder than it needs to be

1. Checking robots.txt in a browser and concluding it’s fine

A browser request has a full Chrome/Firefox user agent, sends cookies, runs JavaScript. Bot Fight Mode and WAF rules see it as a human request and let it through. The robots.txt renders fine in your browser. The same URL, fetched with a GPTBot/1.0 user agent, gets a 403. Always use curl with the actual AI crawler user agent to test, not a browser.

2. Fixing robots.txt content while the Cloudflare block is still active

The most common support pattern: operator updates robots.txt.liquid to add Allow: / blocks for GPTBot, re-runs the CatalogScan check, still fails. The robots.txt update was correct and necessary — but it’s irrelevant while Cloudflare is returning a 403 before the request reaches Shopify. Fix Cloudflare first. Then fix robots.txt. Then re-scan.

3. Using “Allow Googlebot, block everything else” as the bot policy

Circa 2022 this was sensible. In 2026 it’s actively harmful for AI shopping revenue. ChatGPT Shopping, Perplexity Shopping, Google AI Mode, and Apple Intelligence each use a different crawler from traditional Googlebot. A policy of “only Googlebot gets through” routes 100% of your AI shopping traffic to a 403. Revisit any Cloudflare policy set during the traditional-SEO era and add the 2025–2026 AI shopping crawler set explicitly.

4. Enabling Bot Fight Mode “just in case” during a traffic spike

When a store gets hit by a scraper or a DDoS, Bot Fight Mode is the first thing support agents suggest. It’s easy to turn on and hard to remember to turn off. A store that enabled it during a traffic spike three months ago and never revisited will have been invisible to AI shopping crawlers for three months. Run the curl test right now if you’re not certain it’s off.

5. Not testing after every Cloudflare rule change

Cloudflare rule evaluation is ordered and cumulative. Adding a new Skip rule for GPTBot doesn’t guarantee it fires first — if there’s a higher-priority IP block rule, a rate limit, or a managed challenge in front of it, the Skip rule may never be reached. After every Cloudflare change, re-run the curl test with each AI crawler UA from the commands in Step 0. The only ground truth is the actual HTTP response the crawler sees.

Verification playbook

Work through these steps in order after making changes. Each step takes <2 minutes. Don’t skip ahead — a failure at Step 1 makes all subsequent steps meaningless.

Step Command / Action Pass condition
1. curl GPTBot curl -si -A "GPTBot/1.0 (+https://openai.com/gptbot)" https://yourstore.com/robots.txt | head -5 First line is HTTP/2 200, body starts with User-agent:
2. curl OAI-SearchBot curl -si -A "OAI-SearchBot/1.0 (+https://openai.com/searchbot)" https://yourstore.com/robots.txt | head -5 Same as Step 1
3. curl PerplexityBot curl -si -A "PerplexityBot/1.0 (+https://docs.perplexity.ai/guides/bots)" https://yourstore.com/robots.txt | head -5 Same as Step 1
4. curl ClaudeBot curl -si -A "ClaudeBot/0.5 (+https://www.anthropic.com/claude-web-crawler)" https://yourstore.com/robots.txt | head -5 Same as Step 1
5. Check WAF Events Cloudflare dashboard → Security → Events → filter last 15 min No block or challenge events with GPTBot/OAI-SearchBot/PerplexityBot UA
6. Re-scan with CatalogScan Run a free scan → scroll to the robots-open chip robots-open chip shows green, 15/15
7. Re-test after next Cloudflare publish Make any Cloudflare change, re-run Step 1 Cloudflare rule changes sometimes reorder the rule stack; verify each time

If you pass Steps 1–4 (all four curl commands return the robots.txt body) but CatalogScan’s robots-open signal is still failing, the issue moved from Cloudflare to the robots.txt content itself. See the robots-open signal guide for content-level diagnosis: blanket Disallow rules, targeted AI-bot blocks added during a “no AI training” stance, and headless storefronts that never emit a robots.txt at the canonical path.

One edge case worth knowing: CatalogScan’s fetch comes from a data-center IP range. If you have a Cloudflare IP Access Rule that challenges or blocks data-center IPs (a common anti-scraper measure), the scan fetch will be challenged even when real AI crawlers get through. In that case, the curl test will pass but the CatalogScan scan will show the signal failing. Both are real conditions, and both need to be fixed: the real AI crawlers come from data-center IPs too.

See also

Is Cloudflare blocking your AI shopping crawlers?

Free 2-minute scan. We fetch /robots.txt with each major AI-agent UA from a data-center IP, parse the response body, and flag any WAF or Bot Fight Mode intercept — so you know in 90 seconds whether your Cloudflare setup is the problem.

Scan my store → Read the robots-open signal guide