seo-robots-ai

Installation

$npx skills add lionkiii/claude-seo-skills --skill seo-robots-ai

Summary

Fetch and analyze a site's robots.txt file specifically for AI crawler access policies, checking against a registry of AI bots (GPTBot, ClaudeBot, PerplexityBot, etc.) and generating an access matrix with an AI openness score. The agent can execute this workflow end-to-end, then recommend robots.txt changes aligned with the user's content protection or visibility goals.

SKILL.MD

AI Crawler Robots.txt Audit

Analyzes a site's robots.txt specifically for AI crawler access policies. Complements /seo-technical (which does a broad robots.txt check) with deep AI-specific analysis.

@skills/seo/references/ai-crawlers-guide.md

AI Crawler Registry

Bot NameOwnerPurpose
GPTBotOpenAITraining data + ChatGPT web search
OAI-SearchBotOpenAIChatGPT search only (not training)
ChatGPT-UserOpenAIChatGPT browsing (real-time)
ClaudeBotAnthropicTraining data collection
anthropic-aiAnthropicAnthropic web crawler
PerplexityBotPerplexityAI search engine
Google-ExtendedGoogleGemini / AI training (not Search)
BytespiderByteDanceTikTok / AI training
CCBotCommon CrawlOpen dataset used by many AI models
Applebot-ExtendedAppleApple Intelligence training
cohere-aiCohereAI model training
FacebookBotMetaMeta AI training
Meta-ExternalAgentMetaMeta AI browsing agent
AmazonbotAmazonAlexa / AI training
DiffbotDiffbotAI knowledge graph
ImagesiftBotImagesiftBotAI image training
OmgiliWebz.ioAI data feeds

Inputs

  • url: The website URL to audit (will fetch /robots.txt from site root)
    • Normalize to domain root: example.com/pagehttps://example.com/robots.txt

Execution

  1. Fetch robots.txt: WebFetch <domain>/robots.txt

    • If 404 → report "No robots.txt found — all crawlers allowed by default"
    • If 200 → proceed to parse
  2. Parse User-agent blocks: Extract all User-agent directives and their associated Allow / Disallow rules.

  3. Check each AI crawler: For each bot in the registry, determine access:

    • Allowed — No specific block, or explicit Allow: /
    • BlockedDisallow: / for this User-agent
    • Partial — Some paths blocked, others allowed (list specifics)
    • Inherited — Falls under User-agent: * rules (note this)
  4. Check wildcard rules: If User-agent: * has Disallow: /, note that ALL bots (including AI) are blocked unless explicitly allowed.

  5. Check for ai.txt: WebFetch <domain>/ai.txt — an emerging standard for AI-specific crawler policies. Report if found and summarize contents.

  6. Check for llms.txt: WebFetch <domain>/llms.txt — report if found (cross-reference with /seo llms-txt for full audit).

  7. Analyze crawl-delay: Note any Crawl-delay directives that affect AI bots specifically or via wildcard.

  8. Check sitemap declaration: Note if Sitemap: directive is present (helps AI crawlers discover content).

Output Format

## AI Crawler Audit: [domain]

### Crawler Access Matrix

| Crawler | Owner | Status | Rule Source | Details |
|---|---|---|---|---|
| GPTBot | OpenAI | Allowed/Blocked/Partial | Line [#] | [specific rules] |
| ClaudeBot | Anthropic | Allowed/Blocked/Partial | Line [#] | [specific rules] |
| PerplexityBot | Perplexity | Allowed/Blocked/Partial | Line [#] | [specific rules] |
| Google-Extended | Google | Allowed/Blocked/Partial | Line [#] | [specific rules] |
| ... | ... | ... | ... | ... |

### AI Openness Score: X/10

Scoring:
- 10/10 = All AI crawlers allowed, ai.txt present, llms.txt present
- 7-9 = Most crawlers allowed, some minor gaps
- 4-6 = Mixed policy — some allowed, some blocked
- 1-3 = Most AI crawlers blocked
- 0/10 = All AI crawlers blocked (or blanket Disallow: /)

### Key Findings

- **AI crawlers explicitly blocked**: [count] of [total]
- **AI crawlers explicitly allowed**: [count]
- **Falling under wildcard rules**: [count]
- **ai.txt present**: Yes/No
- **llms.txt present**: Yes/No
- **Sitemap declared**: Yes/No

### Recommendations

Based on the site's apparent goals:

**If goal is maximum AI visibility:**
- [Specific recommendations to allow AI crawlers]
- [Suggest llms.txt creation if missing]

**If goal is AI protection:**
- [Note any crawlers not yet blocked]
- [Suggest ai.txt adoption]

**If goal is selective access:**
- [Recommend allowing search-focused bots: OAI-SearchBot, PerplexityBot]
- [Block training-only bots: CCBot, Bytespider]
- [Distinguish training vs search crawlers]

### Industry Context

Note how the site's policy compares to common patterns:
- Most major publishers block training bots but allow search bots
- Most SaaS companies allow all AI crawlers for visibility
- E-commerce sites typically allow all crawlers
- Media/news sites increasingly block training-only bots

### robots.txt Snippets

If the user wants to implement changes, provide ready-to-paste robots.txt
blocks for their chosen strategy:

**Allow all AI crawlers:**

AI Crawlers — Allowed

User-agent: GPTBot Allow: /

User-agent: ClaudeBot Allow: /

User-agent: PerplexityBot Allow: /

User-agent: Google-Extended Allow: /


**Block training, allow search:**

AI Search — Allowed

User-agent: OAI-SearchBot Allow: /

User-agent: PerplexityBot Allow: /

AI Training — Blocked

User-agent: GPTBot Disallow: /

User-agent: ClaudeBot Disallow: /

User-agent: CCBot Disallow: /

User-agent: Google-Extended Disallow: /

User-agent: Bytespider Disallow: /