Is Your robots.txt Blocking AI Shopping Bots?

June 13, 2026|8 min read

Most of the work of getting a store ready for AI shopping is additive. You add structured data, you add ratings, you add a clear product description. But there is one place where a single line of text can undo all of it, and where the problem is subtraction rather than addition: your robots.txt file. One overly broad rule can quietly tell ChatGPT, Perplexity, and Google's AI features to skip your store entirely, and most merchants never know it is there.

When we audited 575 online stores, about 1 in 9 of the stores with a readable robots.txt were blocking at least one major AI shopping crawler. Very few of those looked deliberate. This post explains what robots.txt does, which crawlers actually matter for AI shopping, how stores end up blocking them by accident, and how to check and fix yours in a few minutes.

What robots.txt Actually Does

robots.txt is a plain text file at the root of your domain, at yourstore.com/robots.txt. It is the first thing a well-behaved crawler reads before it fetches anything else. The file is a set of rules that say, in effect, "this crawler may read these paths, and may not read those." It is the oldest and most universal mechanism on the web for controlling automated access.

Two things are worth understanding up front. First, robots.txt is voluntary. It works because reputable crawlers choose to honor it, not because anything forces them to. The AI shopping crawlers from OpenAI, Perplexity, and Google all publicly commit to obeying it, which is exactly why a block in your file actually removes you from their results. Second, the rules are matched by user-agent. Each crawler announces a name, and robots.txt can set different rules for different names, or one rule for all of them using a wildcard.

A minimal file looks like this:

User-agent: *
Disallow: /admin
Allow: /

That says: for every crawler, do not crawl anything under /admin, but everything else is fair game. The danger is in how easily that Disallow line can be widened, on purpose or by accident, until it catches the crawlers you actually want.

The AI Shopping Crawlers That Matter

AI shopping visibility comes down to a handful of named crawlers. If your robots.txt blocks these, your products will not be considered for the corresponding engine's recommendations.

OAI-SearchBot. OpenAI's crawler for surfacing sites in ChatGPT search results. This is the single most important one for ChatGPT Shopping visibility, because it is what builds the index ChatGPT draws on when it answers a shopping question. It is not used for model training.
ChatGPT-User. Fetches a page in real time when a user action triggers a live lookup, for example when ChatGPT goes out to verify a product detail mid-conversation. Blocking it can break those live retrievals.
GPTBot. OpenAI's general crawler, primarily associated with training. Many merchants block this one specifically to opt out of model training, which is a legitimate choice. The important point is that GPTBot and OAI-SearchBot are different crawlers with different purposes.
PerplexityBot. Perplexity's indexing crawler. It builds the index behind Perplexity's answers and its shopping features. Perplexity-User handles live, user-triggered fetches in the same way ChatGPT-User does.
Google-Extended. Not a crawler in the usual sense but a permission token. It controls whether your content can be used by Google's generative AI products like Gemini. Blocking it does not affect Googlebot or your search rankings, but it can limit how Google's AI features use your store. We cover this distinction in detail in the Google AI Overviews checklist.

There is a longer list of AI crawlers in the wild (ClaudeBot, CCBot, Applebot-Extended, Amazonbot, and others), but the five above are the ones that govern visibility in the AI shopping experiences shoppers are using right now.

What We Found in the Wild

Across the 213 stores in our study that returned a readable robots.txt, 23 of them, or 10.8%, blocked at least one of these crawlers. The per-crawler breakdown:

GPTBot blocked: 10.3% of stores
Google-Extended blocked: 9.9%
PerplexityBot blocked: 8.5%
OAI-SearchBot blocked: 7.5%

The detail that stood out was not the headline rate. It was the pattern. Very few of these blocks looked like a considered decision to stay out of AI shopping. Most looked like collateral damage: blanket rules that caught these crawlers along with everything else, or copied-in AI opt-out snippets meant to address training that also swept up the shopping-relevant bots. A store that blocks OAI-SearchBot has, almost certainly without intending to, removed itself from ChatGPT search results.

How Stores Block AI Crawlers by Accident

Nobody sits down and decides to be invisible to AI shopping. The blocks get there in a few predictable ways.

1. The leftover staging block

The most severe and most common accident is a blanket disallow that was never removed after launch:

User-agent: *
Disallow: /

That single block tells every crawler, including Googlebot and every AI shopping bot, to stay out of the entire site. It is standard on staging and development environments and is supposed to be removed when a site goes live. When it survives launch, it is catastrophic for both search and AI visibility. If you find this on a production store, it is the most urgent thing on this page.

2. The "block AI" toggle

Several platforms and SEO plugins added one-click "block AI bots" or "opt out of AI training" settings over the past two years. They are well-intentioned, but they tend to be blunt. A toggle labeled as a training opt-out will often disallow OAI-SearchBot and PerplexityBot too, conflating training with search visibility. If you flipped one of these on, it is worth checking exactly what it wrote into your file.

3. The copied snippet

When the AI crawling conversation peaked, a lot of "paste this into your robots.txt to keep AI out" snippets circulated. They were written for publishers worried about training, not for stores that want to be found by shoppers. A typical one looks like this:

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

For a content publisher protecting articles from training, blocking GPTBot is a reasonable stance. For a store that wants to sell products, the OAI-SearchBot and PerplexityBot lines in that snippet are quietly turning away buyers.

4. The security plugin

Some security and bot-management tools maintain aggressive deny lists and add AI crawler user-agents to them automatically. The intent is to reduce scraping load, but the side effect is the same: the engines that would recommend your products cannot read your pages.

How to Check Yours in Two Minutes

You do not need any tools to do a first pass. Open a browser and go to yourstore.com/robots.txt. Read it top to bottom and look for two things.

First, find any line that reads Disallow: / (a slash with nothing after it, which means the entire site) and check which User-agent block it sits under. If it is under User-agent: *, that is a site-wide block affecting everyone. If it is under User-agent: GPTBot, OAI-SearchBot, PerplexityBot, ChatGPT-User, Perplexity-User, or Google-Extended, that is an AI shopping block.

Second, remember that rules apply to the user-agent block they appear under, and a crawler obeys the most specific block that names it. So a permissive User-agent: * section does not save you if there is a separate, more specific User-agent: OAI-SearchBot section with a Disallow: / in it. Read each named block on its own terms.

The Fix

The fix for an accidental block is almost always to remove the offending Disallow: / lines, not to add new ones. By default, anything a robots.txt does not disallow is allowed, so a clean file simply does not mention the shopping crawlers at all, and they are free to read your store.

If you want to be explicit, or if you want to keep a training opt-out while preserving shopping visibility, this is a sensible configuration:

# Let AI shopping crawlers read product pages
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

# Optional: opt out of model training while staying
# visible in shopping results. This is your call.
User-agent: GPTBot
Disallow: /

That configuration keeps you fully visible in ChatGPT search and Perplexity while declining to contribute to OpenAI's model training. If you have no concerns about training, you can drop the final block and allow GPTBot as well. The point is that the decision should be deliberate, with each crawler considered on its own, rather than handled by a single broad rule that you copied from somewhere.

A note on Google-Extended: whether to allow it is a genuine judgment call rather than an obvious fix. Allowing it lets Google's AI features use and cite your content. Blocking it keeps your content out of those features without touching your search rankings. Neither is wrong. Just make sure that if it is blocked, it is blocked on purpose.

After You Change It

Two things to do once you have edited the file. First, reload yourstore.com/robots.txt in your browser and confirm the change is actually live. On platforms with caching, an edit in an admin panel does not always publish instantly. Second, give the crawlers time. robots.txt changes are picked up the next time a crawler revisits, which can take days to a few weeks depending on how often your site is crawled. Removing a block does not produce overnight visibility, but it removes the wall that was preventing it.

The Bottom Line

robots.txt is the cheapest possible AI shopping fix, because it costs nothing and takes minutes, and it is the most expensive possible mistake, because a single wrong line can cancel out every other improvement you make. Before you invest in structured data, ratings markup, or content, spend two minutes confirming that the engines you are optimizing for are actually allowed to read your store. It is the one check where doing nothing can quietly cost you everything.

If you would rather not parse the file by hand, Krytho checks your robots.txt for every major AI shopping crawler as part of a free scan, and flags any that are blocked along with the rest of your AI readiness signals.

Is your store ready for AI shopping?

Paste any product URL. Instant, free results showing exactly what AI shopping engines see.Run a free AI readiness scan

Frequently asked questions

Will allowing AI shopping crawlers slow down my site or run up bandwidth?

In practice, no. The major AI shopping crawlers (OAI-SearchBot, PerplexityBot, GPTBot) crawl politely, fetch a small number of pages at a time, and respect crawl-delay directives. Their traffic is a tiny fraction of what a normal storefront serves to shoppers and to Googlebot. If you are concerned about a specific aggressive crawler, you can rate-limit it rather than block it outright.

Does blocking GPTBot remove me from ChatGPT Shopping?

Not necessarily, because OpenAI uses more than one crawler. GPTBot is primarily the training crawler. OAI-SearchBot is the one that surfaces sites in ChatGPT search results, and ChatGPT-User fetches pages when a user action triggers a live lookup. If you want to stay visible in ChatGPT Shopping while limiting training use, the cleaner move is to allow OAI-SearchBot and ChatGPT-User and decide about GPTBot separately. Blocking all three removes you from ChatGPT's web-sourced results.

What is the difference between Google-Extended and Googlebot?

Googlebot crawls your site for Google Search. Google-Extended is a separate permission token that controls whether your content can be used by Google's generative AI products, such as Gemini. Blocking Google-Extended does not affect Googlebot and does not change your normal search rankings, but it can limit how Google's AI features use and cite your store. The two are independent, so a block on one says nothing about the other.

I am on Shopify. Can I even edit my robots.txt?

Yes. Shopify generates a default robots.txt automatically, but since 2021 you can customize it by editing the robots.txt.liquid template in your theme. That means you can both check what your store currently sends and adjust the rules for AI crawlers. Many other platforms (WooCommerce, BigCommerce, custom builds) let you edit the file directly or through an SEO plugin.

How do I confirm a crawler actually obeys my robots.txt?

robots.txt is a voluntary standard. The major AI shopping crawlers from OpenAI, Perplexity, and Google publicly commit to honoring it, and they publish IP ranges and user-agent strings you can verify against your server logs. Disreputable scrapers ignore robots.txt entirely, but those are not the crawlers feeding AI shopping results. For the engines that matter for visibility, the rules are respected.