skill-md

Installation

$npx skills add AndacGuven/technical-seo-skill --skill skill-md

Summary

This skill enables an agent to execute full-scope technical SEO audits on domains and subdomains, producing a structured Excel workbook with broken links, orphaned URLs, metadata issues, schema analysis, and image SEO findings. The agent can manage checkpointed crawl state, resume interrupted audits, and scale crawl parameters for large sites.

SKILL.MD

technical-seo-skill

This skill is the operating contract for production-grade technical SEO crawls.

Use it when the user asks to:

  • crawl a domain or website
  • generate a technical SEO audit file
  • inspect broken internal links
  • detect orphaned pages
  • audit metadata, canonicals, H1s, hreflang, headings, schema, and image SEO
  • produce an execution-ready workbook for SEO operations

Primary Entry Point

Run the skill through:

  • technical_seo_skill.py

The underlying crawl engine is:

  • technical_seo_skill.py

Hard Output Contract

These requirements are mandatory:

  1. The final deliverable must be an .xlsx workbook written under reports/.
  2. A markdown report is never an acceptable substitute for the workbook.
  3. The crawl must create a checkpoint directory under checkpoints/<domain>/.
  4. The checkpoint must contain the normal crawl state files used by this project.
  5. The workbook must follow the same structure used by reports/fal-ai-final-fresh.xlsx.
  6. Issues must be the first sheet.
  7. Schema must be a single combined sheet.
  8. Broken Internal and Orphaned URLs must be present.
  9. Do not generate a separate AI corrections workbook.

If these conditions are not met, the job is incomplete.

Expected Crawl Behavior

This skill should behave like a real technical SEO specialist.

That means it should:

  • discover URLs across the main domain and relevant subdomains
  • normalize noisy parameter variants
  • keep auth wrappers from polluting SEO task URLs
  • validate internal targets before reporting broken internal links
  • calculate orphaned URLs from the internal crawl graph
  • inspect image inventory and image quality issues
  • inspect structured data and produce a schema summary plus schema samples
  • convert findings into action-ready tasks

Do not treat this skill like a single-page metadata checker.

Core Audit Scope

The audit should check, when available:

  • robots.txt
  • llms.txt
  • sitemap discovery and parsing
  • page status codes
  • load time
  • title presence, length, and duplication
  • meta description presence, length, and duplication
  • H1 presence, count, and duplication
  • heading hierarchy issues
  • canonical presence and non-self canonicals
  • hreflang presence
  • structured data presence
  • broken internal links
  • orphaned URLs
  • crawl depth
  • low text to HTML ratio
  • thin content
  • image inventory
  • image file sizes and formats
  • missing alt text
  • weak alt text
  • broken internal images
  • duplicate patterns
  • remediation tasks

Standard Workflow

  1. Confirm whether the user wants a fresh crawl or a resumed crawl.
  2. If the user wants a fresh crawl, delete the existing checkpoint directory for that domain first.
  3. Run technical_seo_skill.py. If the user does not specify --output, the skill should automatically create a domain-based Excel filename under reports/.
  4. For large sites, prefer explicit --max-urls, --workers, --batch-size, and --timeout values.
  5. Monitor checkpoints/<domain>/state.json during long runs.
  6. Do not claim the report is complete until discovery, link validation, image audit, and workbook phases are complete.
  7. After completion, verify that the workbook opens and the expected sheets exist.
  8. Summarize the results from the workbook, not from assumptions.

Command Patterns

Basic run:

python technical_seo_skill.py https://example.com

Recommended large-site run:

python technical_seo_skill.py https://example.com ^
  --max-urls 12000 ^
  --workers 12 ^
  --batch-size 50 ^
  --timeout 10

Resume an interrupted crawl:

python technical_seo_skill.py https://example.com ^
  --max-urls 12000 ^
  --workers 12 ^
  --batch-size 50 ^
  --timeout 10 ^
  --resume

If --output is omitted, the skill should default to:

reports/<domain>_audit.xlsx

Workbook Structure Contract

The workbook should match this structure:

  • Issues
  • Sayfa Bilgileri
  • Blog Pages
  • Images
  • Large Images (100KB+)
  • Error Pages
  • Robots.txt & Sitemaps
  • N-gram Analysis
  • Duplicate Content Issues
  • Meta Tag Issues
  • Image SEO Issues
  • Heading Structure Issues
  • Internal Link Issues
  • Page Weight Issues
  • Content Quality Issues
  • URL Structure Issues
  • Structured Data Issues
  • Keyword Analysis
  • Tasks
  • Excluded Auth
  • Orphaned URLs
  • High Depth URLs
  • Missing Canonical
  • Slow URLs
  • Broken Internal
  • Site Checks
  • Schema
  • Duplicates
  • Links

Issues should combine:

  • overview metrics
  • raw crawl summary metrics
  • consolidated issue inventory

Schema should combine:

  • schema summary
  • schema samples

Checkpoint Expectations

During a valid crawl, the domain checkpoint should contain the normal state files for this project, including:

  • state.json
  • pages.csv
  • links_raw.csv
  • images_raw.csv
  • target_status.csv
  • image_meta.csv

If these files are not being created during the crawl, the skill is not operating correctly.

Guardrails

  • Keep all generated documentation and workbook-facing text in English unless the user explicitly asks otherwise.
  • Do not present ?share= variants as separate SEO task URLs.
  • Do not present login or auth wrapper URLs as final SEO task URLs.
  • Do not remove Broken Internal or Orphaned URLs from the workflow.
  • Do not say the crawl is complete if only discovery has finished.
  • Do not silently narrow scope to the apex domain if relevant subdomains are in scope.
  • Do not deliver prose or markdown when the user asked for an audit file.
  • Do not change the workbook structure casually.

Review Order

After the workbook is generated, review in this order:

  1. Issues
  2. Tasks
  3. Broken Internal
  4. Orphaned URLs
  5. High Depth URLs
  6. Schema
  7. Images
  8. Sayfa Bilgileri

Alignment Rule

If the engine changes materially, update this file so it stays aligned with:

  • real sheet names
  • real crawl behavior
  • real checkpoint behavior
  • real output files

Do not document features that do not exist in the codebase.