Installation
$npx skills add lionkiii/claude-seo-skills --skill seo-sitemapSummary
The agent can validate existing XML sitemaps against format standards and actual Google indexing status, or generate new sitemaps from scratch with quality gates that prevent common penalties like thin location pages. Use when addressing sitemap issues, indexing problems, or building site architecture documentation.
SKILL.MD
Sitemap Analysis & Generation
Mode 1: Analyze Existing Sitemap
Validation Checks
- Valid XML format
- URL count <50,000 per file (protocol limit)
- All URLs return HTTP 200
<lastmod>dates are accurate (not all identical)- No deprecated tags:
<priority>and<changefreq>are ignored by Google - Sitemap referenced in robots.txt
- Compare crawled pages vs sitemap ā flag missing pages
Quality Signals
- Sitemap index file if >50k URLs
- Split by content type (pages, posts, images, videos)
- No non-canonical URLs in sitemap
- No noindexed URLs in sitemap
- No redirected URLs in sitemap
- HTTPS URLs only (no HTTP)
Common Issues
| Issue | Severity | Fix |
|---|---|---|
| >50k URLs in single file | Critical | Split with sitemap index |
| Non-200 URLs | High | Remove or fix broken URLs |
| Noindexed URLs included | High | Remove from sitemap |
| Redirected URLs included | Medium | Update to final URLs |
| All identical lastmod | Low | Use actual modification dates |
| Priority/changefreq used | Info | Can remove (ignored by Google) |
Mode 2: Generate New Sitemap
Process
- Ask for business type (or auto-detect from existing site)
- Load industry template from
assets/directory - Interactive structure planning with user
- Apply quality gates:
- ā ļø WARNING at 30+ location pages (require 60%+ unique content)
- š HARD STOP at 50+ location pages (require justification)
- Generate valid XML output
- Split at 50k URLs with sitemap index
- Generate STRUCTURE.md documentation
Safe Programmatic Pages (OK at scale)
ā Integration pages (with real setup docs) ā Template/tool pages (with downloadable content) ā Glossary pages (200+ word definitions) ā Product pages (unique specs, reviews) ā User profile pages (user-generated content)
Penalty Risk (avoid at scale)
ā Location pages with only city name swapped ā "Best [tool] for [industry]" without industry-specific value ā "[Competitor] alternative" without real comparison data ā AI-generated pages without human review and unique value
Sitemap Format
Standard Sitemap
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page</loc>
<lastmod>2026-02-07</lastmod>
</url>
</urlset>
Sitemap Index (for >50k URLs)
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2026-02-07</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2026-02-07</lastmod>
</sitemap>
</sitemapindex>
Output
For Analysis
VALIDATION-REPORT.mdā analysis results- Issues list with severity
- Recommendations
For Generation
sitemap.xml(or split files with index)STRUCTURE.mdā site architecture documentation- URL count and organization summary
Live Data Insights (MCP Overlay)
@skills/seo/references/mcp-degradation.md
GSC Indexing Cross-Reference
If GSC available: Use ToolSearch with query "+google-search-console" to check availability.
- If GSC MCP tools are returned: fetch
get_indexing_statusorinspect_urlfor a sample of sitemap URLs (max 20 URLs to avoid rate-limit issues). - Add
### Sitemap vs Index Coverage (GSC)section showing how many sitemap URLs are actually indexed, which are not indexed, and the specific non-indexing reasons reported by Google. - This is the most valuable cross-reference for sitemap audits ā it reveals whether Google is actually processing the sitemap correctly.
- If GSC MCP is not available: proceed with static sitemap validation only, noting that live indexing status is unavailable.
Sitemap vs Index Coverage (when GSC available)
| Metric | Count | Notes |
|---|---|---|
| Sitemap URLs sampled | [up to 20] | |
| Confirmed indexed | [count] | |
| Not indexed | [count] | |
| Coverage errors | [count] | See breakdown below |
Non-indexing reasons breakdown (from GSC inspect_url):
- Crawled ā currently not indexed: [count]
- Discovered ā currently not indexed: [count]
- Blocked by robots.txt: [count]
- Page with redirect: [count]
- Other reasons: [count]
Data Sources
| Source | Status | Data Provided |
|---|---|---|
| Static XML Analysis | Always available | Format validation, URL checks, structure review |
GSC MCP (+google-search-console) | If connected | Live indexing status per URL, non-indexing reasons |