Robots.txt For Shopify Stores: AEO Playbook For Today's LLM Crawl Bots
In the assistant era, robots.txt does more than keep staging folders private. It also signals which AI crawlers may use your content for training or instant answers. This guide focuses on Shopify stores and shows which bots to name, what to allow or deny, and how to keep Answer Engine Optimization (AEO) in balance with brand and policy.
How robots.txt works on Shopify
- Shopify generates a default file, but you can customize it via
templates/robots.txt.liquid
. - Rules live at
https://yourdomain.com/robots.txt
and are read by compliant crawlers on every visit. - Use it to allow the public parts of your catalog and to set explicit rules for AI crawlers while keeping checkout, cart, and account paths disallowed.
LLM and AI-related crawlers you should know
These are the high-impact agents most Shopify stores reference directly:
- GPTBot (OpenAI web crawler for model training and answers)
- ChatGPT-User (OpenAI on-demand retrieval when users browse via ChatGPT)
- ClaudeBot, Claude-User, Claude-SearchBot (Anthropic crawlers for training, user retrieval, and search)
- Google-Extended (Google's token to allow/deny use of your content for Bard/Gemini & Vertex AI)
- PerplexityBot (Perplexity AI search crawler)
- CCBot (Common Crawl, widely reused by AI systems)
- Applebot-Extended (Apple's control for use of content in Apple Intelligence models)
"Allow vs. block" is a business choice. Many brands allow user-initiated fetchers (like ChatGPT-User) but restrict model-training bots, or vice-versa. Keep policy consistent with your partnership strategy.
Short, safe starting point for Shopify robots.txt
Keep it compact. Allow your public catalog, block private flows, then set clear AI bot rules. Example:
# Shopify robots.txt (AEO-aware baseline) User-agent: * Allow: /products/ Allow: /collections/ Allow: /pages/ Disallow: /cart Disallow: /checkout Disallow: /account Sitemap: https://www.example.com/sitemap.xml # OpenAI User-agent: GPTBot Allow: /products/ Disallow: /cart User-agent: ChatGPT-User Allow: / # Anthropic User-agent: ClaudeBot Disallow: / User-agent: Claude-User Allow: / User-agent: Claude-SearchBot Allow: /collections/ Allow: /products/ # Google AI usage control User-agent: Google-Extended Disallow: / # Perplexity User-agent: PerplexityBot Allow: /products/ Disallow: /cart # Common Crawl User-agent: CCBot Allow: /products/ Disallow: /cart # Apple AI usage control (does not crawl itself) User-agent: Applebot-Extended Disallow: /
Adjust per policy: if you prefer broader inclusion, switch specific bots from Disallow
to Allow
on your public pages only. Always keep sensitive paths blocked.
AEO perspective: what to allow and why
- Allow user-initiated agents (ChatGPT-User, Claude-User) on public PDPs, policy pages, and guides so assistants can quote the same answers shoppers see.
- Decide on training crawlers (GPTBot, ClaudeBot, CCBot): permitting them may increase reuse over time; blocking preserves tighter control.
- Use Google-Extended and Applebot-Extended to control whether your content is used for model training while keeping normal search crawling intact.
- Never expose carts, checkouts, customer accounts, app proxy internals, or staged content.
Operational tips for Shopify teams
- Create
templates/robots.txt.liquid
and keep all AI directives in one labeled block with a dated comment. - Mirror policies across locales and subdomains; assistants read per host.
- Log AI user-agents separately and watch for mismatches or stealth crawls. Escalate through your CDN/WAF if needed.
- Coordinate with Product JSON-LD: if you want assistants to cite PDPs, ensure identifiers (GTIN/MPN), availability, and price parity are correct.