Robots.txt For Shopify Stores: AEO Playbook For Today's LLM Crawl Bots

In the assistant era, robots.txt does more than keep staging folders private. It also signals which AI crawlers may use your content for training or instant answers. This guide focuses on Shopify stores and shows which bots to name, what to allow or deny, and how to keep Answer Engine Optimization (AEO) in balance with brand and policy.


How robots.txt works on Shopify

  • Shopify generates a default file, but you can customize it via templates/robots.txt.liquid.
  • Rules live at https://yourdomain.com/robots.txt and are read by compliant crawlers on every visit.
  • Use it to allow the public parts of your catalog and to set explicit rules for AI crawlers while keeping checkout, cart, and account paths disallowed.

LLM and AI-related crawlers you should know

These are the high-impact agents most Shopify stores reference directly:

  • GPTBot (OpenAI web crawler for model training and answers)
  • ChatGPT-User (OpenAI on-demand retrieval when users browse via ChatGPT)
  • ClaudeBot, Claude-User, Claude-SearchBot (Anthropic crawlers for training, user retrieval, and search)
  • Google-Extended (Google's token to allow/deny use of your content for Bard/Gemini & Vertex AI)
  • PerplexityBot (Perplexity AI search crawler)
  • CCBot (Common Crawl, widely reused by AI systems)
  • Applebot-Extended (Apple's control for use of content in Apple Intelligence models)

"Allow vs. block" is a business choice. Many brands allow user-initiated fetchers (like ChatGPT-User) but restrict model-training bots, or vice-versa. Keep policy consistent with your partnership strategy.


Short, safe starting point for Shopify robots.txt

Keep it compact. Allow your public catalog, block private flows, then set clear AI bot rules. Example:

# Shopify robots.txt (AEO-aware baseline)
User-agent: *
Allow: /products/
Allow: /collections/
Allow: /pages/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Sitemap: https://www.example.com/sitemap.xml

# OpenAI
User-agent: GPTBot
Allow: /products/
Disallow: /cart
User-agent: ChatGPT-User
Allow: /

# Anthropic
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-User
Allow: /
User-agent: Claude-SearchBot
Allow: /collections/
Allow: /products/

# Google AI usage control
User-agent: Google-Extended
Disallow: /

# Perplexity
User-agent: PerplexityBot
Allow: /products/
Disallow: /cart

# Common Crawl
User-agent: CCBot
Allow: /products/
Disallow: /cart

# Apple AI usage control (does not crawl itself)
User-agent: Applebot-Extended
Disallow: /

Adjust per policy: if you prefer broader inclusion, switch specific bots from Disallow to Allow on your public pages only. Always keep sensitive paths blocked.


AEO perspective: what to allow and why

  • Allow user-initiated agents (ChatGPT-User, Claude-User) on public PDPs, policy pages, and guides so assistants can quote the same answers shoppers see.
  • Decide on training crawlers (GPTBot, ClaudeBot, CCBot): permitting them may increase reuse over time; blocking preserves tighter control.
  • Use Google-Extended and Applebot-Extended to control whether your content is used for model training while keeping normal search crawling intact.
  • Never expose carts, checkouts, customer accounts, app proxy internals, or staged content.

Operational tips for Shopify teams

  • Create templates/robots.txt.liquid and keep all AI directives in one labeled block with a dated comment.
  • Mirror policies across locales and subdomains; assistants read per host.
  • Log AI user-agents separately and watch for mismatches or stealth crawls. Escalate through your CDN/WAF if needed.
  • Coordinate with Product JSON-LD: if you want assistants to cite PDPs, ensure identifiers (GTIN/MPN), availability, and price parity are correct.

FAQ: Robots.txt For Shopify Stores (AEO)

Which AI bots should a Shopify store name explicitly
A practical set is: GPTBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, Google-Extended, PerplexityBot, CCBot, and Applebot-Extended. This covers OpenAI, Anthropic, Google's AI usage control, Perplexity, Common Crawl, and Apple's training control.
Should I block training crawlers but allow user initiated retrieval
Many brands do exactly that. It preserves on-demand quoting of your live PDPs and policies while limiting bulk reuse for model training. Review legal and partner strategy before choosing.
Does Google-Extended replace Googlebot
No. Google-Extended is a separate control token for using your content in generative AI (Gemini/Vertex). You can disallow Google-Extended while still allowing normal Googlebot crawling for search.
What about Applebot-Extended
Applebot-Extended controls use of your content for Apple's AI training. It does not crawl itself. Blocking it opts your site out of training while Applebot can still crawl for search features.
Can I rely on robots.txt alone
Robots.txt is a norm, not an enforcement layer. Most reputable bots honor it. For non-compliant traffic, use your CDN/WAF to verify user-agent + reverse DNS and rate-limit or block as needed.