technical-seo-skill

This skill is the operating contract for production-grade technical SEO crawls.

Use it when the user asks to:

crawl a domain or website
generate a technical SEO audit file
inspect broken internal links
detect orphaned pages
audit metadata, canonicals, H1s, hreflang, headings, schema, and image SEO
produce an execution-ready workbook for SEO operations

Primary Entry Point

Run the skill through:

technical_seo_skill.py

The underlying crawl engine is:

technical_seo_skill.py

Hard Output Contract

These requirements are mandatory:

The final deliverable must be an .xlsx workbook written under reports/.
A markdown report is never an acceptable substitute for the workbook.
The crawl must create a checkpoint directory under checkpoints/<domain>/.
The checkpoint must contain the normal crawl state files used by this project.
The workbook must follow the same structure used by reports/fal-ai-final-fresh.xlsx.
Issues must be the first sheet.
Schema must be a single combined sheet.
Broken Internal and Orphaned URLs must be present.
Do not generate a separate AI corrections workbook.

If these conditions are not met, the job is incomplete.

Expected Crawl Behavior

This skill should behave like a real technical SEO specialist.

That means it should:

discover URLs across the main domain and relevant subdomains
normalize noisy parameter variants
keep auth wrappers from polluting SEO task URLs
validate internal targets before reporting broken internal links
calculate orphaned URLs from the internal crawl graph
inspect image inventory and image quality issues
inspect structured data and produce a schema summary plus schema samples
convert findings into action-ready tasks

Do not treat this skill like a single-page metadata checker.

Core Audit Scope

The audit should check, when available:

robots.txt
llms.txt
sitemap discovery and parsing
page status codes
load time
title presence, length, and duplication
meta description presence, length, and duplication
H1 presence, count, and duplication
heading hierarchy issues
canonical presence and non-self canonicals
hreflang presence
structured data presence
broken internal links
orphaned URLs
crawl depth
low text to HTML ratio
thin content
image inventory
image file sizes and formats
missing alt text
weak alt text
broken internal images
duplicate patterns
remediation tasks

Standard Workflow

Confirm whether the user wants a fresh crawl or a resumed crawl.
If the user wants a fresh crawl, delete the existing checkpoint directory for that domain first.
Run technical_seo_skill.py. If the user does not specify --output, the skill should automatically create a domain-based Excel filename under reports/.
For large sites, prefer explicit --max-urls, --workers, --batch-size, and --timeout values.
Monitor checkpoints/<domain>/state.json during long runs.
Do not claim the report is complete until discovery, link validation, image audit, and workbook phases are complete.
After completion, verify that the workbook opens and the expected sheets exist.
Summarize the results from the workbook, not from assumptions.

Command Patterns

Basic run:

python technical_seo_skill.py https://example.com

Recommended large-site run:

python technical_seo_skill.py https://example.com ^
  --max-urls 12000 ^
  --workers 12 ^
  --batch-size 50 ^
  --timeout 10

Resume an interrupted crawl:

python technical_seo_skill.py https://example.com ^
  --max-urls 12000 ^
  --workers 12 ^
  --batch-size 50 ^
  --timeout 10 ^
  --resume

If --output is omitted, the skill should default to:

reports/<domain>_audit.xlsx

Workbook Structure Contract

The workbook should match this structure:

Issues
Sayfa Bilgileri
Blog Pages
Images
Large Images (100KB+)
Error Pages
Robots.txt & Sitemaps
N-gram Analysis
Duplicate Content Issues
Meta Tag Issues
Image SEO Issues
Heading Structure Issues
Internal Link Issues
Page Weight Issues
Content Quality Issues
URL Structure Issues
Structured Data Issues
Keyword Analysis
Tasks
Excluded Auth
Orphaned URLs
High Depth URLs
Missing Canonical
Slow URLs
Broken Internal
Site Checks
Schema
Duplicates
Links

Issues should combine:

overview metrics
raw crawl summary metrics
consolidated issue inventory

Schema should combine:

schema summary
schema samples

Checkpoint Expectations

During a valid crawl, the domain checkpoint should contain the normal state files for this project, including:

state.json
pages.csv
links_raw.csv
images_raw.csv
target_status.csv
image_meta.csv

If these files are not being created during the crawl, the skill is not operating correctly.

Guardrails

Keep all generated documentation and workbook-facing text in English unless the user explicitly asks otherwise.
Do not present ?share= variants as separate SEO task URLs.
Do not present login or auth wrapper URLs as final SEO task URLs.
Do not remove Broken Internal or Orphaned URLs from the workflow.
Do not say the crawl is complete if only discovery has finished.
Do not silently narrow scope to the apex domain if relevant subdomains are in scope.
Do not deliver prose or markdown when the user asked for an audit file.
Do not change the workbook structure casually.

Review Order

After the workbook is generated, review in this order:

Issues
Tasks
Broken Internal
Orphaned URLs
High Depth URLs
Schema
Images
Sayfa Bilgileri

Alignment Rule

If the engine changes materially, update this file so it stays aligned with:

real sheet names
real crawl behavior
real checkpoint behavior
real output files

Do not document features that do not exist in the codebase.

skill-md

Installation

Summary

SKILL.MD