Autoresearch Skill

Karpathy-style optimization loops for any conversion-focused content. No traffic needed. Simulated expert panel. Minutes, not weeks.

When to use this: Pre-launch content optimization. Generate 50+ variants, score with 5 simulated experts, evolve winners, output the best version + full experiment log.

When NOT to use this: Post-launch real-traffic A/B testing — that requires real analytics, not simulated scoring.

The sequence: Run autoresearch FIRST to hit 85+ simulated score. Then deploy. Then validate with real traffic.

What You'll Produce

Every run outputs 3 files:

File	Purpose
`{name}-optimized.{ext}`	The winning optimized content
`data/{name}-experiments.json`	Full experiment log — all variants + all scores
`data/{name}-optimization-report.md`	Human-readable summary with winner rationale

Expert Panel (5 Personas)

Score every variant against all 5. Batch all variants into a single API call per round.

#	Persona	Scoring Lens
1	CMO at a mid-market B2B company (50M+ revenue)	"Would this make me stop and engage?"
2	Skeptical founder	"Do I believe this? Would I trust this company?"
3	Conversion rate optimizer	"Is this clear, specific, and action-driving?"
4	Senior copywriter	"Is this compelling, differentiated, and well-crafted?"
5	Your CEO/founder	"Direct, ROI-obsessed, no BS. Would I put this on my site?"

Customization: Replace persona #5 with your own CEO/founder voice. Define their priorities and communication style in a references/founder-voice.md file.

Each judge scores 0–100. Final score = average across all 5 judges.

Round Structure (Per Content Element)

Round 1:
  → Generate 10 variants of the element
  → Batch-score all 10 with the 5-expert panel (1 API call)
  → Rank by average score
  → Keep top 3

Round 2 (Evolution):
  → Analyze what the top 3 did right
  → Generate 10 new variants that push those winning patterns further
  → Batch-score all 10 (1 API call)
  → Keep top 3

Round 3 (If score < threshold):
  → Identify weakest scoring dimension
  → Generate 10 variants optimized for that dimension
  → Batch-score → keep top 1

Multi-element cross-breeding:
  → Take top 1 winner from each element
  → Generate 5 combinations that mix winning elements
  → Score holistically as complete units
  → Output the single best combination

Stop condition: Top variant hits minimum score threshold (default: 80) OR 3 rounds complete.

Content Types & Score Dimensions

Landing Pages

Elements to optimize: Hero headline, subheadline, CTA text, problem section, social proof

Score dimensions:

first_impression — Does it grab immediately?
clarity — Is the offer instantly understood?
trust — Does it feel credible?
urgency — Is there a reason to act now?
would_convert — Would the judge actually click?

Email Sequences

Elements to optimize: Subject line, opening line, body copy, CTA, PS line

Score dimensions:

would_open — Subject line pass rate
would_read — Does the opening hook?
would_click — Is the CTA compelling?
would_reply — Does it feel personal enough to respond to?
spam_risk — Does it feel spammy? (lower = better; invert for final score)

Ad Copy

Elements to optimize: Headline, description, CTA

Score dimensions:

scroll_stopping — Does it interrupt the scroll?
clarity — Is the value prop clear in 3 seconds?
click_worthiness — Does the judge want to click?
relevance — Does it match likely audience intent?
differentiation — Does it stand out from competitors?

Form Pages

Elements to optimize: Headline, subtext, value prop bullets, button text, field order, thank-you copy

Score dimensions:

first_impression — Does it feel worth filling out?
trust — Do they believe their info is safe and the offer is real?
completion_likelihood — Would the judge start filling it out?
lead_quality — Would this attract serious prospects (not tire-kickers)?
would_fill_out — Final gut check: would they submit?

Step-by-Step Execution Protocol

Step 1: Intake & Parse

Read the source content. Identify content type automatically or confirm with user:

HTML file → landing page or form page
Markdown / plain text → email or ad copy
If ambiguous, ask: "Is this a landing page, email sequence, ad copy, or form page?"

Extract all optimizable elements. List them back to user:

Found 5 elements to optimize:
1. Hero headline: "We help B2B companies grow"
2. Subheadline: "Full-service digital marketing..."
3. CTA: "Get Started"
4. Problem statement: [excerpt]
5. Social proof: [excerpt]

Optimizing: all | Variants per round: 10 | Min score: 80

Step 2: Get API Key

Check for Anthropic API key: $ANTHROPIC_API_KEY environment variable.

export ANTHROPIC_API_KEY="your-api-key-here"

Step 3: Run Optimization Rounds

For each element, run the round structure above.

Critical API efficiency rule: ALWAYS batch all variants into a single prompt. Never call the API once per variant. A round with 10 variants = 1 API call.

Model preference (in order):

claude-sonnet-4-5 (preferred — fast + smart)
claude-opus-4 (if highest quality needed)
Any claude-3.5+ model if the above aren't available

Step 4: Cross-Breed (Multi-Element)

After all elements have winners:

Assemble the top winner from each element into a complete unit
Generate 5 holistic variants that naturally combine the winning elements
Score the complete units (not just individual parts)
Pick the winner with the highest holistic score

Step 5: Write Output Files

# Create output directory
mkdir -p data

# Write optimized content
# Write experiments JSON
# Write optimization report

Experiments JSON structure:

{
  "run_id": "autoresearch-{name}-{timestamp}",
  "content_type": "landing_page",
  "source_file": "path/to/original",
  "min_score_threshold": 80,
  "rounds": [
    {
      "round": 1,
      "element": "hero_headline",
      "variants": [
        {
          "id": 1,
          "text": "...",
          "scores": {
            "cmo": 72,
            "skeptical_founder": 68,
            "cro": 75,
            "copywriter": 70,
            "founder": 65
          },
          "avg_score": 70
        }
      ],
      "top_3": [1, 4, 7],
      "winner_score": 82
    }
  ],
  "final_winner": {
    "hero_headline": "...",
    "subheadline": "...",
    "cta": "...",
    "holistic_score": 87
  }
}

Step 6: Report Back

Summarize results to user:

Final winning score
Biggest score jump (which element improved most)
Top 2 runner-up alternatives (in case winner doesn't feel right)
Path to all 3 output files
Clear next step

User Options

Option	Default	Description
`elements`	all	Which elements to optimize
`variants_per_round`	10	How many variants to generate per round
`min_score`	80	Stop when this score is hit
`rounds`	3	Max rounds before stopping
`auto_apply`	false	Whether to overwrite the source file with winners
`content_type`	auto-detect	Force a content type if auto-detect is wrong

Quality Gates

< 70: Don't ship. Something fundamental is broken.
70-79: Marginal. One more round targeting the lowest-scoring dimension.
80-84: Good. Shippable. Validate with real traffic.
85-89: Strong. Ship with confidence.
90+: Rare. Ship immediately.

Anti-Patterns to Avoid

Never call the API once per variant. Always batch. A 10-variant round = 1 call.
Don't over-optimize for one dimension. If you're hitting 95 on clarity but 45 on trust, the overall score is misleading.
Don't run more than 5 rounds. If you're not hitting 80 after 3 rounds, the problem is strategic (wrong positioning), not tactical (wrong words).
Don't cross-breed until each element has its own winner. Premature cross-breeding creates incoherent combinations.

autoresearch

Installation

Summary

SKILL.MD

Autoresearch Skill

What You'll Produce

Expert Panel (5 Personas)

Round Structure (Per Content Element)

Content Types & Score Dimensions

Landing Pages

Email Sequences

Ad Copy

Form Pages

Step-by-Step Execution Protocol

Step 1: Intake & Parse

Step 2: Get API Key

Step 3: Run Optimization Rounds

Step 4: Cross-Breed (Multi-Element)

Step 5: Write Output Files

Step 6: Report Back

User Options

Quality Gates

Anti-Patterns to Avoid