Installation
$npx skills add ericosiu/ai-marketing-skills --skill autoresearchSummary
This skill enables an agent to generate 50+ variants of conversion-focused content, score them against a 5-person expert panel, evolve winners through multiple rounds, and output a fully optimized version with complete experiment logs. Invoke when pre-launch content optimization is needed before real-traffic testing.
SKILL.MD
Autoresearch Skill
Karpathy-style optimization loops for any conversion-focused content. No traffic needed. Simulated expert panel. Minutes, not weeks.
When to use this: Pre-launch content optimization. Generate 50+ variants, score with 5 simulated experts, evolve winners, output the best version + full experiment log.
When NOT to use this: Post-launch real-traffic A/B testing — that requires real analytics, not simulated scoring.
The sequence: Run autoresearch FIRST to hit 85+ simulated score. Then deploy. Then validate with real traffic.
What You'll Produce
Every run outputs 3 files:
| File | Purpose |
|---|---|
{name}-optimized.{ext} | The winning optimized content |
data/{name}-experiments.json | Full experiment log — all variants + all scores |
data/{name}-optimization-report.md | Human-readable summary with winner rationale |
Expert Panel (5 Personas)
Score every variant against all 5. Batch all variants into a single API call per round.
| # | Persona | Scoring Lens |
|---|---|---|
| 1 | CMO at a mid-market B2B company (50M+ revenue) | "Would this make me stop and engage?" |
| 2 | Skeptical founder | "Do I believe this? Would I trust this company?" |
| 3 | Conversion rate optimizer | "Is this clear, specific, and action-driving?" |
| 4 | Senior copywriter | "Is this compelling, differentiated, and well-crafted?" |
| 5 | Your CEO/founder | "Direct, ROI-obsessed, no BS. Would I put this on my site?" |
Customization: Replace persona #5 with your own CEO/founder voice. Define their priorities and communication style in a
references/founder-voice.mdfile.
Each judge scores 0–100. Final score = average across all 5 judges.
Round Structure (Per Content Element)
Round 1:
→ Generate 10 variants of the element
→ Batch-score all 10 with the 5-expert panel (1 API call)
→ Rank by average score
→ Keep top 3
Round 2 (Evolution):
→ Analyze what the top 3 did right
→ Generate 10 new variants that push those winning patterns further
→ Batch-score all 10 (1 API call)
→ Keep top 3
Round 3 (If score < threshold):
→ Identify weakest scoring dimension
→ Generate 10 variants optimized for that dimension
→ Batch-score → keep top 1
Multi-element cross-breeding:
→ Take top 1 winner from each element
→ Generate 5 combinations that mix winning elements
→ Score holistically as complete units
→ Output the single best combination
Stop condition: Top variant hits minimum score threshold (default: 80) OR 3 rounds complete.
Content Types & Score Dimensions
Landing Pages
Elements to optimize: Hero headline, subheadline, CTA text, problem section, social proof
Score dimensions:
first_impression— Does it grab immediately?clarity— Is the offer instantly understood?trust— Does it feel credible?urgency— Is there a reason to act now?would_convert— Would the judge actually click?
Email Sequences
Elements to optimize: Subject line, opening line, body copy, CTA, PS line
Score dimensions:
would_open— Subject line pass ratewould_read— Does the opening hook?would_click— Is the CTA compelling?would_reply— Does it feel personal enough to respond to?spam_risk— Does it feel spammy? (lower = better; invert for final score)
Ad Copy
Elements to optimize: Headline, description, CTA
Score dimensions:
scroll_stopping— Does it interrupt the scroll?clarity— Is the value prop clear in 3 seconds?click_worthiness— Does the judge want to click?relevance— Does it match likely audience intent?differentiation— Does it stand out from competitors?
Form Pages
Elements to optimize: Headline, subtext, value prop bullets, button text, field order, thank-you copy
Score dimensions:
first_impression— Does it feel worth filling out?trust— Do they believe their info is safe and the offer is real?completion_likelihood— Would the judge start filling it out?lead_quality— Would this attract serious prospects (not tire-kickers)?would_fill_out— Final gut check: would they submit?
Step-by-Step Execution Protocol
Step 1: Intake & Parse
Read the source content. Identify content type automatically or confirm with user:
- HTML file → landing page or form page
- Markdown / plain text → email or ad copy
- If ambiguous, ask: "Is this a landing page, email sequence, ad copy, or form page?"
Extract all optimizable elements. List them back to user:
Found 5 elements to optimize:
1. Hero headline: "We help B2B companies grow"
2. Subheadline: "Full-service digital marketing..."
3. CTA: "Get Started"
4. Problem statement: [excerpt]
5. Social proof: [excerpt]
Optimizing: all | Variants per round: 10 | Min score: 80
Step 2: Get API Key
Check for Anthropic API key: $ANTHROPIC_API_KEY environment variable.
export ANTHROPIC_API_KEY="your-api-key-here"
Step 3: Run Optimization Rounds
For each element, run the round structure above.
Critical API efficiency rule: ALWAYS batch all variants into a single prompt. Never call the API once per variant. A round with 10 variants = 1 API call.
Model preference (in order):
claude-sonnet-4-5(preferred — fast + smart)claude-opus-4(if highest quality needed)- Any claude-3.5+ model if the above aren't available
Step 4: Cross-Breed (Multi-Element)
After all elements have winners:
- Assemble the top winner from each element into a complete unit
- Generate 5 holistic variants that naturally combine the winning elements
- Score the complete units (not just individual parts)
- Pick the winner with the highest holistic score
Step 5: Write Output Files
# Create output directory
mkdir -p data
# Write optimized content
# Write experiments JSON
# Write optimization report
Experiments JSON structure:
{
"run_id": "autoresearch-{name}-{timestamp}",
"content_type": "landing_page",
"source_file": "path/to/original",
"min_score_threshold": 80,
"rounds": [
{
"round": 1,
"element": "hero_headline",
"variants": [
{
"id": 1,
"text": "...",
"scores": {
"cmo": 72,
"skeptical_founder": 68,
"cro": 75,
"copywriter": 70,
"founder": 65
},
"avg_score": 70
}
],
"top_3": [1, 4, 7],
"winner_score": 82
}
],
"final_winner": {
"hero_headline": "...",
"subheadline": "...",
"cta": "...",
"holistic_score": 87
}
}
Step 6: Report Back
Summarize results to user:
- Final winning score
- Biggest score jump (which element improved most)
- Top 2 runner-up alternatives (in case winner doesn't feel right)
- Path to all 3 output files
- Clear next step
User Options
| Option | Default | Description |
|---|---|---|
elements | all | Which elements to optimize |
variants_per_round | 10 | How many variants to generate per round |
min_score | 80 | Stop when this score is hit |
rounds | 3 | Max rounds before stopping |
auto_apply | false | Whether to overwrite the source file with winners |
content_type | auto-detect | Force a content type if auto-detect is wrong |
Quality Gates
- < 70: Don't ship. Something fundamental is broken.
- 70-79: Marginal. One more round targeting the lowest-scoring dimension.
- 80-84: Good. Shippable. Validate with real traffic.
- 85-89: Strong. Ship with confidence.
- 90+: Rare. Ship immediately.
Anti-Patterns to Avoid
- Never call the API once per variant. Always batch. A 10-variant round = 1 call.
- Don't over-optimize for one dimension. If you're hitting 95 on clarity but 45 on trust, the overall score is misleading.
- Don't run more than 5 rounds. If you're not hitting 80 after 3 rounds, the problem is strategic (wrong positioning), not tactical (wrong words).
- Don't cross-breed until each element has its own winner. Premature cross-breeding creates incoherent combinations.