Home › Blog › Cloudflare for Shopify: AI shopping agents
Cloudflare for Shopify: the three settings that silently block AI shopping agents
Your robots.txt says Allow: / for GPTBot. Your Shopify theme hasn’t been touched in months. And yet CatalogScan shows your robots-open signal failing at 0 points. The culprit is almost certainly Cloudflare — and it’s almost certainly one of three specific settings. This guide shows you which one, how to find it in the Cloudflare dashboard, and how to fix it without making your store an open target for real scrapers.
cf.client.bot — blocks all bots Cloudflare recognises, including GPTBot and OAI-SearchBot. Run the curl test below first to confirm Cloudflare is the problem, then work through the three settings in order.
In this guide
- Step 0 — confirm Cloudflare is the problem (the curl test)
- Why Cloudflare can pass your robots.txt text and still fail your store
- Setting 1 — Bot Fight Mode
- Setting 2 — AI Scrapers and Crawlers managed rule
- Setting 3 — custom WAF rules with
cf.client.bot - The safe Cloudflare config: what to keep, what to remove
- 5 mistakes that make this harder than it needs to be
- Verification playbook
Step 0 — confirm Cloudflare is the problem
Before touching any Cloudflare setting, run this curl command from your terminal. It impersonates GPTBot, OpenAI’s combined training-and-shopping crawler, and fetches your robots.txt:
curl -si \
-A "GPTBot/1.0 (+https://openai.com/gptbot)" \
https://yourstore.com/robots.txt | head -30
Replace yourstore.com with your actual domain. Look at the first 30 lines of the response:
- If you see
User-agent:andDisallow:rules — robots.txt is returning normally. Cloudflare is not blocking the crawler. The issue is in the robots.txt content itself. See the robots-open signal guide for content-level fixes. - If you see
HTTP/2 403or a redirect to a challenge URL — Cloudflare is blocking GPTBot before it reaches your server. You’re in the right place; continue with the three settings below. - If you see HTML with “Just a moment” or “Checking your browser” — Bot Fight Mode is serving a JavaScript challenge. The crawler can’t pass it; it sees your store as closed. This is Setting 1.
- If the command times out or returns no output — could be a rate limit, a firewall rule that drops connections silently, or your origin being down. Add
-vand try a second time.
Run the same test with the other major AI shopping crawlers to understand the full scope of the problem:
# OAI-SearchBot: the ChatGPT shopping retrieval crawler (separate from GPTBot)
curl -si -A "OAI-SearchBot/1.0 (+https://openai.com/searchbot)" \
https://yourstore.com/robots.txt | head -5
# PerplexityBot: Perplexity Shopping
curl -si -A "PerplexityBot/1.0 (+https://docs.perplexity.ai/guides/bots)" \
https://yourstore.com/robots.txt | head -5
# Google-Extended: Google AI Mode and AI Overviews
curl -si -A "Googlebot/2.1 (+http://www.google.com/bot.html)" \
-H "X-Forwarded-For: 66.249.66.1" \
https://yourstore.com/robots.txt | head -5
# ClaudeBot: Anthropic AI training and Claude features
curl -si -A "ClaudeBot/0.5 (+https://www.anthropic.com/claude-web-crawler)" \
https://yourstore.com/robots.txt | head -5
It’s common for one crawler to get through while another is blocked. GPTBot and OAI-SearchBot come from different IP ranges and are classified differently by Cloudflare’s bot intelligence, so a rule that lets GPTBot through can still block OAI-SearchBot. Test each one.
Why Cloudflare can pass your robots.txt text and still fail your store
Cloudflare operates at the edge, between the internet and your Shopify store. It sees every request before Shopify does. If Cloudflare decides a request looks bot-like, it can intercept it and return a challenge page (HTTP 200 with JavaScript) or a hard block (HTTP 403) — without ever asking your Shopify origin what to do.
This creates a gap that trips up most Shopify operators who check their robots.txt: they read the file in their browser, it looks fine, they assume all crawlers can see it. But AI crawlers are fetching via curl-like HTTP clients, not a browser, and Cloudflare’s bot detection looks at the user agent, TLS fingerprint, and connection behaviour to classify the request. A GPTBot fetch looks very different from a Chrome browser fetch, and Cloudflare classifies it accordingly.
/robots.txt with each major AI-agent UA and reads the body — sees the challenge HTML, not a robots.txt rule set, and fails the signal. The store thinks it’s open because the robots.txt text says so. It’s actually closed at the CDN layer.
The four AI crawlers that matter for shopping surfaces right now, and what they do with your catalog:
GPTBot
ChatGPT training + ChatGPT Shopping product indexing
GPTBot/1.0 (+https://openai.com/gptbot)
OAI-SearchBot
ChatGPT real-time product retrieval (Bing-sourced)
OAI-SearchBot/1.0 (+https://openai.com/searchbot)
PerplexityBot
Perplexity Shopping product cards
PerplexityBot/1.0 (+https://docs.perplexity.ai/guides/bots)
Google-Extended
Google AI Mode, AI Overviews, Google Shopping AI features
Googlebot/2.1 (+http://www.google.com/bot.html)
If any of these return anything other than your actual robots.txt content, they can’t proceed to index your catalog, read your JSON-LD, or include your products in AI-generated shopping results. The robots-open signal is the floor signal: at 0 points, every other signal becomes irrelevant because the crawler never reaches your product pages.
Setting 1 — Bot Fight Mode
Dashboard path: Cloudflare → select your zone → Security → Bots
Bot Fight Mode is Cloudflare’s first-generation bot mitigation. When enabled, it serves a JavaScript challenge (or sometimes a CAPTCHA) to any request whose user agent and TLS fingerprint match Cloudflare’s “bot” classification. Every AI shopping crawler passes this test: they send a clear user agent string (GPTBot/1.0), they don’t run JavaScript, and they don’t solve CAPTCHAs. The challenge response (HTTP 200, HTML body) looks like a successful request to a naive observer but contains no robots.txt content.
Bot Fight Mode is available on every Cloudflare plan including Free. It defaults to on for many legacy free zones provisioned before 2024, and defaults to off for newer zones — but it’s a single toggle and many operators enable it during a security scare and forget it’s on.
How to diagnose it
The curl test is your fastest check: if you see <!DOCTYPE html> or “Just a moment” in the response to a GPTBot user agent curl, Bot Fight Mode is the culprit. Alternatively, in the Cloudflare dashboard, navigate to Security → Bots. You’ll see a toggle labelled “Bot Fight Mode.” If it’s on and you don’t have a Skip rule for AI crawlers, every AI shopping crawler is being challenged.
Fix option A: disable Bot Fight Mode entirely
If your store sells to end consumers (not a B2B API with sensitive data), disabling Bot Fight Mode is safe and simple. The toggle is a blunt instrument for many legitimate use cases — it blocks Googlebot on misconfigured zones, price-comparison crawlers that drive legitimate traffic, and AI shopping crawlers that represent growing retail revenue. Turn it off, run the curl test again, and verify you get robots.txt content back.
Fix option B: create a Skip rule for AI crawlers (recommended if you must keep Bot Fight Mode)
If your store has a real DDoS problem or you sell high-value inventory that attracts scrapers, you may want to keep Bot Fight Mode for genuine bad actors. Cloudflare lets you create “Skip” WAF rules that bypass Bot Fight Mode for specific conditions. Navigate to Security → WAF → Custom Rules and create a rule with:
Action: Skip
Skip: All Bot Fight Mode features
When: (http.user_agent contains "GPTBot") or
(http.user_agent contains "OAI-SearchBot") or
(http.user_agent contains "PerplexityBot") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "Google-Extended") or
(http.user_agent contains "Applebot-Extended")
Place this rule above any block rules in the Custom Rules list. Order matters: Cloudflare evaluates rules top-to-bottom and stops at the first match. After saving, run the curl test — you should now get the robots.txt content back for each of these agents.
Super Bot Fight Mode (Pro plan and above)
Pro, Business, and Enterprise zones have a more granular version called Super Bot Fight Mode. It classifies bots into “Definitely automated,” “Likely automated,” and “Verified bots.” AI shopping crawlers (GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot) are generally classified as “Verified bots” by Cloudflare since they include their documentation URL in the user agent string. If your Super Bot Fight Mode action for “Verified bots” is “Allow,” you’re fine. If it’s “Challenge” or “Block,” that’s the issue — change it to “Allow.”
Setting 2 — AI Scrapers and Crawlers managed rule
Dashboard path: Cloudflare → select your zone → Security → WAF → Managed Rules
In 2024, Cloudflare added a managed rule specifically labelled “AI Scrapers and Crawlers” to its WAF managed ruleset. When enabled, it blocks a curated list of known AI crawlers by user agent — including GPTBot, ClaudeBot, CCBot, Bytedance-UA, and anthropic-ai. The intent was to give operators an easy way to opt out of AI training data collection. The unintended side effect is that it also blocks the same crawlers when they’re fetching for AI shopping retrieval, not training.
This rule defaults to off for new Cloudflare zones but may have been toggled on deliberately (there was a wave of “how to block AI scrapers” tutorials in 2023–2024) or it may be on in older zone migrations. Unlike Bot Fight Mode, this rule is specifically targeting AI bots by name, so it’s more surgical — it won’t block Googlebot — but it will block every AI shopping crawler you care about.
How to diagnose it
Navigate to Security → WAF → Managed Rules in your Cloudflare dashboard. Look for a rule with “AI Scrapers” or “AI Crawlers” in the name. If its action is “Block” or “Challenge,” it’s the problem. You can also check Security → Events (formerly Firewall Events) and filter by the rule ID — if you see GPTBot requests being blocked with a managed rule match, that’s it.
Fix: disable the managed rule
Click the rule and set its action to “Disabled” or delete it. If you want to keep blocking AI training crawlers (CCBot, the Common Crawl crawler used by many LLM training pipelines) without blocking AI shopping crawlers, Cloudflare doesn’t currently offer a single toggle that makes that distinction. Your options are:
- Disable the managed rule entirely and accept that all AI crawlers can fetch your store. This is the correct setting for a Shopify store that wants AI shopping visibility.
- Disable the managed rule and add a custom block rule targeting only CCBot specifically:
http.user_agent contains "CCBot". CCBot is the Common Crawl crawler; blocking it opts you out of Common Crawl-based LLM training without affecting shopping crawlers.
You cannot use a robots.txt Disallow rule to block CCBot while keeping the managed rule on. The managed rule acts at the network layer before robots.txt is ever served, and it doesn’t distinguish between training crawlers and shopping retrieval crawlers.
The “training vs. shopping” decision is now yours to make deliberately
The managed rule was created when the only use case for GPTBot was training ChatGPT. That’s no longer true. GPTBot now also drives ChatGPT Shopping product indexing. Blocking GPTBot opts you out of both. Most Shopify operators who enabled this rule in 2023 or 2024 made a deliberate decision about LLM training but an accidental decision about AI shopping revenue. Review it now with both use cases in mind.
Setting 3 — custom WAF rules using cf.client.bot
cf.client.bot
Security → WAF → Custom Rules
HIGH RISK
Dashboard path: Cloudflare → select your zone → Security → WAF → Custom Rules
The Cloudflare field cf.client.bot is a boolean that evaluates to true for any request Cloudflare recognises as coming from a known bot — including AI shopping crawlers. A very common custom WAF rule pattern copied from security guides is (cf.client.bot) with action “Block” or “Challenge.” This rule blocks all bots Cloudflare can identify, including Googlebot, Bingbot, GPTBot, OAI-SearchBot, and every AI shopping crawler. It’s the widest possible net and the most common source of hard-to-diagnose robot blocks.
How to diagnose it
Navigate to Security → WAF → Custom Rules. Read through your active rules. Any rule with cf.client.bot in the expression and a “Block,” “Challenge,” or “Managed Challenge” action will block AI shopping crawlers. Also check for rules using cf.bot_management.score with a threshold you don’t control (bot management score is a 0–100 scale; a threshold of 30 blocks everything AI-crawler-shaped).
Also look for rules that are “allow except Googlebot” shaped — a common pattern from 2022–2023 security tutorials:
# This rule blocks all bots except Googlebot — including every AI shopping crawler
(cf.client.bot and not http.user_agent contains "Googlebot")
If you find a rule like this, you’re allowing Google’s traditional web crawler (which was the SEO gold standard in 2022) but blocking every AI shopping crawler that didn’t exist when the rule was written.
Fix: rewrite the rule to allow AI shopping crawlers explicitly
Two options depending on whether you want to keep a bot-blocking rule at all:
Option A — delete the cf.client.bot rule entirely and rely on Bot Fight Mode (with a Skip rule for verified bots) for bot mitigation. This is cleaner and lets Cloudflare’s own bot intelligence do the work instead of a blanket boolean.
Option B — add AI shopping crawlers to the allowlist in your existing rule:
# Before (blocks all bots Cloudflare identifies, including GPTBot):
(cf.client.bot)
# After (blocks unverified bots, allows known AI shopping crawlers):
(cf.client.bot and
not (http.user_agent contains "GPTBot") and
not (http.user_agent contains "OAI-SearchBot") and
not (http.user_agent contains "PerplexityBot") and
not (http.user_agent contains "ClaudeBot") and
not (http.user_agent contains "Google-Extended") and
not (http.user_agent contains "Applebot-Extended"))
Be aware that cf.client.bot checks a Cloudflare-maintained list of known bots. Newly launched AI shopping crawlers (and new bot types from existing providers) won’t be in the list immediately, so a rule built on cf.client.bot is inherently brittle as the AI shopping crawler landscape evolves. Adding explicit user-agent allowlists for the current set of crawlers is a better long-term approach than relying on Cloudflare’s internal classification alone.
The cf.bot_management.score variant
If you have Cloudflare Bot Management (an Enterprise add-on), you may have rules using the cf.bot_management.score field. This is a 0–100 score where lower scores indicate more bot-like traffic. AI shopping crawlers typically score between 1 and 20. A rule like (cf.bot_management.score lt 30 and not cf.bot_management.verified_bot) should pass known verified bots (GPTBot, Googlebot, ClaudeBot are all in Cloudflare’s verified bot list) but block unverified low-score traffic. That’s a reasonable setup. A rule like (cf.bot_management.score lt 50) with no verified-bot exception will block most AI crawlers along with the bad actors you actually want to stop.
The safe Cloudflare config for a Shopify store that wants AI shopping visibility
Here’s the minimum Cloudflare setup that keeps your store protected against real threats while remaining fully open to AI shopping crawlers:
| Setting | Recommended state | Why |
|---|---|---|
| Bot Fight Mode | Off — or On with a Skip rule for AI shopping UAs | Default On blocks all AI crawlers. If you keep it On, you need the Skip rule or verified bots won’t get through. |
| AI Scrapers and Crawlers managed rule | Disabled | This rule explicitly targets GPTBot, ClaudeBot, and shopping crawlers by name. Disabling is a deliberate opt-in to AI shopping visibility. |
Custom WAF rules using cf.client.bot |
Rewrite to exclude AI shopping UAs, or delete if redundant | cf.client.bot = true for GPTBot and OAI-SearchBot. A blanket block on this field kills shopping crawlers. |
| DDoS protection (L7) | Keep on, no changes needed | L7 DDoS mitigation is rate-based and threshold-based, not UA-based. It doesn’t block AI crawlers at normal crawl rates. |
| IP Access Rules | Review any block rules for known bot IP ranges | Some IP block lists include Cloudflare’s own bot IP ranges, which overlap with AI crawler egress IPs. IP blocks bypass UA-level allowlists. |
| Rate limiting rules | Set thresholds >100 req/min for /robots.txt path | AI crawlers typically hit /robots.txt once at the start of a crawl job. A rate limit as low as 5 req/min can block repeated crawl jobs. Set path-specific higher limits for known-open paths. |
There’s one more layer that’s easy to miss: IP reputation challenges. Cloudflare can issue Managed Challenges (previously “non-interactive CAPTCHAs”) for requests from IP addresses with poor reputation scores. AI crawlers that come from data-center IP space (all of them do) sometimes have elevated risk scores simply because data-center IPs are overrepresented in bot traffic. If you’re seeing Managed Challenge responses for AI crawler user agents rather than hard 403s, this is the likely cause. Navigate to Security → WAF → Tools → IP Access Rules to review active challenge rules for data-center IP ranges.
5 mistakes that make this harder than it needs to be
1. Checking robots.txt in a browser and concluding it’s fine
A browser request has a full Chrome/Firefox user agent, sends cookies, runs JavaScript. Bot Fight Mode and WAF rules see it as a human request and let it through. The robots.txt renders fine in your browser. The same URL, fetched with a GPTBot/1.0 user agent, gets a 403. Always use curl with the actual AI crawler user agent to test, not a browser.
2. Fixing robots.txt content while the Cloudflare block is still active
The most common support pattern: operator updates robots.txt.liquid to add Allow: / blocks for GPTBot, re-runs the CatalogScan check, still fails. The robots.txt update was correct and necessary — but it’s irrelevant while Cloudflare is returning a 403 before the request reaches Shopify. Fix Cloudflare first. Then fix robots.txt. Then re-scan.
3. Using “Allow Googlebot, block everything else” as the bot policy
Circa 2022 this was sensible. In 2026 it’s actively harmful for AI shopping revenue. ChatGPT Shopping, Perplexity Shopping, Google AI Mode, and Apple Intelligence each use a different crawler from traditional Googlebot. A policy of “only Googlebot gets through” routes 100% of your AI shopping traffic to a 403. Revisit any Cloudflare policy set during the traditional-SEO era and add the 2025–2026 AI shopping crawler set explicitly.
4. Enabling Bot Fight Mode “just in case” during a traffic spike
When a store gets hit by a scraper or a DDoS, Bot Fight Mode is the first thing support agents suggest. It’s easy to turn on and hard to remember to turn off. A store that enabled it during a traffic spike three months ago and never revisited will have been invisible to AI shopping crawlers for three months. Run the curl test right now if you’re not certain it’s off.
5. Not testing after every Cloudflare rule change
Cloudflare rule evaluation is ordered and cumulative. Adding a new Skip rule for GPTBot doesn’t guarantee it fires first — if there’s a higher-priority IP block rule, a rate limit, or a managed challenge in front of it, the Skip rule may never be reached. After every Cloudflare change, re-run the curl test with each AI crawler UA from the commands in Step 0. The only ground truth is the actual HTTP response the crawler sees.
Verification playbook
Work through these steps in order after making changes. Each step takes <2 minutes. Don’t skip ahead — a failure at Step 1 makes all subsequent steps meaningless.
| Step | Command / Action | Pass condition |
|---|---|---|
| 1. curl GPTBot | curl -si -A "GPTBot/1.0 (+https://openai.com/gptbot)" https://yourstore.com/robots.txt | head -5 |
First line is HTTP/2 200, body starts with User-agent: |
| 2. curl OAI-SearchBot | curl -si -A "OAI-SearchBot/1.0 (+https://openai.com/searchbot)" https://yourstore.com/robots.txt | head -5 |
Same as Step 1 |
| 3. curl PerplexityBot | curl -si -A "PerplexityBot/1.0 (+https://docs.perplexity.ai/guides/bots)" https://yourstore.com/robots.txt | head -5 |
Same as Step 1 |
| 4. curl ClaudeBot | curl -si -A "ClaudeBot/0.5 (+https://www.anthropic.com/claude-web-crawler)" https://yourstore.com/robots.txt | head -5 |
Same as Step 1 |
| 5. Check WAF Events | Cloudflare dashboard → Security → Events → filter last 15 min | No block or challenge events with GPTBot/OAI-SearchBot/PerplexityBot UA |
| 6. Re-scan with CatalogScan | Run a free scan → scroll to the robots-open chip | robots-open chip shows green, 15/15 |
| 7. Re-test after next Cloudflare publish | Make any Cloudflare change, re-run Step 1 | Cloudflare rule changes sometimes reorder the rule stack; verify each time |
If you pass Steps 1–4 (all four curl commands return the robots.txt body) but CatalogScan’s robots-open signal is still failing, the issue moved from Cloudflare to the robots.txt content itself. See the robots-open signal guide for content-level diagnosis: blanket Disallow rules, targeted AI-bot blocks added during a “no AI training” stance, and headless storefronts that never emit a robots.txt at the canonical path.
One edge case worth knowing: CatalogScan’s fetch comes from a data-center IP range. If you have a Cloudflare IP Access Rule that challenges or blocks data-center IPs (a common anti-scraper measure), the scan fetch will be challenged even when real AI crawlers get through. In that case, the curl test will pass but the CatalogScan scan will show the signal failing. Both are real conditions, and both need to be fixed: the real AI crawlers come from data-center IPs too.
See also
- robots-open: the 15-point floor signal and how to fix it at the Shopify level
- sitemap: the discovery surface AI agents read before any PDP
/products.json: the AI bulk-ingest feed that Cloudflare can also block- AggregateRating on Shopify: per-app anatomy and fix recipes
- Shopify GTIN requirements for AI shopping agents
- Shopify metafields for AI shopping agents
- ProductGroup JSON-LD on Shopify
- The full 18-signal Agentic Storefronts checklist
- 100-store leaderboard — who scores above 70
- All 15 signals — full reference
Is Cloudflare blocking your AI shopping crawlers?
Free 2-minute scan. We fetch /robots.txt with each major AI-agent UA from a data-center IP, parse the response body, and flag any WAF or Bot Fight Mode intercept — so you know in 90 seconds whether your Cloudflare setup is the problem.