We Audited 575 Online Stores for AI Shopping Readiness. Here Is What We Found.
Why We Ran This
AI shopping engines, ChatGPT Shopping, Perplexity Shopping, and Google AI Overviews, read product pages the same way a crawler does. They fetch your HTML, parse whatever structured data they find, and decide whether your product is clear enough to surface in a recommendation. The stores that are easy to parse get cited. The stores that are not get skipped.
A practical question follows: how prepared are online stores actually? Plenty of blog posts tell merchants what they should do, but there was very little recent, primary data on what stores have actually done. We wanted real numbers. So we built a scanner and ran it.
What We Checked and How
We audited 575 store domains in June 2026 using Krytho's scanner. For each domain we checked three things: the robots.txt file for AI crawler rules, the product page for structured data signals, and the root directory for an llms.txt file. Crawling was polite throughout, with a custom User-Agent identifying the study, a 10-second timeout per request, a maximum concurrency of 3, a 300-millisecond delay between requests per worker, and no more than 5 requests per domain.
Product page analysis required identifying a valid product URL. We used Shopify's standard /products.json endpoint to detect Shopify stores and pull a representative product page. Stores on other platforms, or stores where platform detection failed, were classified as unknown and excluded from page-level counts. That left 82 confirmed Shopify stores, of which 51 product pages were successfully fetched and fully analyzed. robots.txt was readable on 213 of the 575 domains. llms.txt was checked across all 575.
Every number in this post comes from that run. Nothing is estimated or extrapolated.
Finding 1: More Than 1 in 3 Shopify Product Pages Are Invisible at the Schema Layer
Of the 51 Shopify product pages we fully analyzed, 20 had no Product JSON-LD at all. That is 39.2%. More than one in three pages gave AI shopping crawlers nothing machine-readable to work with on the most critical signal for product identification.
The reason this number is worth taking seriously is that it replicates. Before running the 575-domain scan, we ran a smaller pilot on 55 well-known DTC brands, auditing 30 product pages. That pilot found 11 pages, 36.7%, with no Product JSON-LD. Two independent samples, different store lists, separated in time, landing at 36.7% and 39.2%. The directional finding is consistent: more than 1 in 3 is a robust estimate, not a one-off artifact of sample composition.
This matters because Product JSON-LD is not an advanced optimization. It is the baseline. Without it, an AI engine must infer your product name, price, and availability from unstructured page text, and that inference is unreliable. These 39% of stores are not behind on fine-tuning. They are missing the foundation.
Finding 2: Even Stores With Basic Schema Rarely Have the Deep Fields AI Engines Want
Of the 51 stores analyzed, 31 (60.8%) had Product JSON-LD present. That sounds reasonable until you look at what those schemas actually contained.
Deep field adoption across all 51 stores
The table below shows the percentage of all 51 audited stores, not just those with Product JSON-LD, that included each field.
- aggregateRating: 13.7% (7 stores). Ratings are one of the highest-weight signals AI recommendation engines use. A product with a 4.7-star rating from 800 reviews is a much easier recommendation than an unrated product. Only 7 of 51 stores exposed this in machine-readable form.
- shippingDetails: 15.7% (8 stores). Shipping time and carrier information lets an AI answer "can I get this by Thursday?" Stores that include it become viable answers to time-sensitive queries. Stores that do not are filtered out of those answers.
- hasMerchantReturnPolicy: 15.7% (8 stores). Return policy is a primary decision factor for many shoppers and a signal some AI engines use to assess merchant trustworthiness. Barely more than 1 in 7 stores marked it up.
- priceValidUntil: 23.5% (12 stores). This field tells the crawler that your price data is actively maintained and current. Without it, a crawler has no way to know whether the price it found is still accurate. About 1 in 4 stores included it.
- FAQ schema: 9.8% (5 stores). Product FAQ markup answers the specific questions a shopper might ask an AI before buying. Fewer than 1 in 10 stores used it.
- JS shell suspects: 2% (1 store). One store had no Product JSON-LD and fewer than 500 characters of visible text in the server-rendered HTML, suggesting the product content is loaded entirely by JavaScript and not present in the raw HTML a crawler receives.
The pattern is consistent across every field: even among stores that have crossed the Product JSON-LD threshold, the richer signals that AI engines use for comparison shopping are present on a small minority of pages. The gap between what stores have and what AI engines can use is not one field. It is almost all of them.
Finding 3: About 1 in 9 Stores Block at Least One AI Shopping Crawler
Of the 213 stores where we could read a robots.txt file, 23 (10.8%) block at least one major AI shopping crawler. Per-bot breakdown:
- GPTBot blocked: 22 stores (10.3%)
- Google-Extended blocked: 21 stores (9.9%)
- PerplexityBot blocked: 18 stores (8.5%)
- OAI-SearchBot blocked: 16 stores (7.5%)
A note on Google-Extended specifically: this crawler is used for Google AI features, including AI Overviews. Blocking it does not affect Googlebot and does not affect your standard search rankings. But it does opt you out of AI Overviews. Some merchants may make that tradeoff deliberately. Many are probably unaware the distinction exists.
The more important observation is that most of these blocks do not look like deliberate policy decisions. They look like blanket bot rules, "Disallow: /" under a wildcard User-agent, or copied-in AI-training opt-outs that catch shopping crawlers as collateral damage. A merchant who intentionally blocked GPTBot for training purposes likely did not intend to opt out of ChatGPT Shopping recommendations. Those are governed by different mechanisms but share some of the same infrastructure.
Checking your robots.txt takes 30 seconds. It is worth doing.
Finding 4: The llms.txt Gradient Shows Platform Auto-Adoption vs. Merchant Awareness
llms.txt is a simple text file at the root of a domain that tells AI crawlers what the site is, what it sells, and, optionally, how the AI should use that information. It is an emerging convention rather than a formal standard, but it is gaining traction among stores that want to be legible to AI systems.
Across all 575 domains, 114 (19.8%) had a readable llms.txt file. In the pilot, which used 55 well-known DTC brands, 58% had one. The gap is large and the explanation is straightforward: Shopify appears to auto-generate llms.txt files for stores on its platform. We verified real llms.txt files at Gymshark and Allbirds, both Shopify stores. The pilot was concentrated in famous brands, most of which are on Shopify, so the 58% rate reflects Shopify's automatic generation more than active merchant decisions. The 19.8% rate in the broader sample reflects a population where many stores are not on Shopify or are on Shopify configurations that do not auto-generate the file.
The practical implication: if you are on Shopify, you may already have llms.txt without knowing it. If you are not, the file is simple to create and signals to AI crawlers that you have considered your AI visibility posture. Adoption is concentrated at the top of the market. For most merchants, it remains an easy, low-effort differentiator.
What This Means If You Run a Store
The gap in these numbers is not between stores that know about AI shopping readiness and stores that do not. It is a gap in execution depth. Plenty of stores have heard that structured data matters. Far fewer have actually implemented the full set of signals that AI engines use for comparison-ready recommendations.
That creates an opportunity. Adding shippingDetails, hasMerchantReturnPolicy, aggregateRating, and priceValidUntil to your Product JSON-LD is not technically difficult. It is a few hours of work for most platforms, or a plugin configuration on Shopify. The stores that do it now join the 15% to 23% that have it, not the 77% to 85% that do not.
AI shopping is not a distant future scenario. ChatGPT Shopping, Perplexity Shopping, and Google AI Overviews are live and in active use. Stores that are structurally ready are already getting cited. Being ready is currently cheap because most competitors are not.
Check Your Own Store
Here is a short checklist based on what we found to be the most common gaps:
- Product JSON-LD present? Use "View Page Source" on a product page and search for
application/ld+json. If it is not there, that is your first fix. - robots.txt clean? Visit
yourdomain.com/robots.txtand check that GPTBot, OAI-SearchBot, PerplexityBot, and Google-Extended are not blocked, or at minimum that you understand the tradeoffs of any blocks that are in place. - aggregateRating in schema? If you display ratings on your product pages, mark them up. This is the single field most likely to influence whether an AI recommends you over an unrated competitor.
- shippingDetails and hasMerchantReturnPolicy included? These two fields together make your listing useful for time-sensitive and return-sensitive queries, which cover a significant share of real shopper questions.
- priceValidUntil set? If your pricing is static or updated on a schedule, include this field to signal that your price data is current.
- llms.txt at your root? Check
yourdomain.com/llms.txt. If you are on Shopify, you may already have one. If not, consider adding a simple file describing your store and products.
You can also run a free scan with Krytho below. Paste any product URL and get a full breakdown of what AI shopping engines see, including every field checked in this study.
Limitations
These findings are directional, not definitive. The key constraints to keep in mind:
- Page-level stats are Shopify-only, N=51. Product page analysis required a platform detection step that worked reliably only for Shopify. The 39.2% missing schema figure applies to this sample, not to all e-commerce stores.
- One product page per store. Schema completeness can vary significantly across a catalog, particularly between older and newer product listings. A store could have excellent schema on recent products and none on legacy pages.
- Server-rendered HTML only. Stores that inject structured data via client-side JavaScript appear to be missing schema in this audit even if they serve it to browsers. The JS shell detection flagged 2% as probable cases, but the true count may be higher.
- Brand sophistication skew. The seed list was weighted toward established DTC brands. A random sample of smaller Shopify merchants would likely show lower structured data adoption, meaning the real market-wide gap is probably larger than what we measured here.
- Directional, not definitive. 575 domains and 51 product pages are enough to establish a credible directional finding, not enough for narrow confidence intervals. Treat these numbers as a working estimate, not a census.
Is your store ready for AI shopping?
Paste any product URL. Instant, free results showing exactly what AI shopping engines see.Run a free AI readiness scanFrequently asked questions
How were the 575 stores chosen?
The seed list was drawn from well-known direct-to-consumer brands, predominantly Shopify-based. The sample skews toward established, often well-funded stores. Smaller or less prominent merchants are underrepresented, which means the structured data adoption rates we found are likely higher than what a random cross-section of all online stores would show.
Does missing Product JSON-LD really mean invisible to AI shopping engines?
It means structurally harder to read. AI crawlers can sometimes infer product details from unstructured page text, but the inference is noisy and unreliable compared to explicit schema markup. Stores with valid Product JSON-LD give the engine a clear, machine-readable signal. Stores without it force the engine to guess, and products that are hard to parse accurately are less likely to be surfaced in recommendations.
Why were product page stats limited to Shopify stores?
Product page analysis required fetching and parsing a representative product URL. Shopify exposes a standard products.json endpoint that makes it straightforward to identify a valid product page to audit. Stores on other platforms, or stores where platform detection failed, were excluded from page-level counts to avoid comparing apples to oranges. The robots.txt and llms.txt checks covered all 575 domains regardless of platform.
Will you rerun this study with a larger or broader sample?
Yes. The plan is to expand the dataset to include smaller merchants, non-Shopify platforms (WooCommerce, BigCommerce, custom storefronts), and a wider range of product categories. A broader sample will produce numbers with stronger statistical grounding and make cross-platform comparisons possible.
How can I check my own store?
Run a free scan at krytho.com/scan. Paste any product page URL and Krytho checks for Product JSON-LD presence and completeness, AI crawler rules in your robots.txt, llms.txt presence, and several other signals that determine how visible your store is to AI shopping engines.