transcript

Installation

$npx skills add coopersimson96/ai-content-system --skill transcript

Summary

Extract full transcripts, metadata, and captions from YouTube, TikTok, Instagram, X, Facebook, and direct video files; also supports batch processing, translation, YouTube search, and web page scraping. This agent can handle async jobs, parse multiple input formats, and save outputs to structured files.

SKILL.MD

Transcript Extractor — SupaData API

Extract transcripts from any video platform with one command. Paste a URL, get the transcript.


Config

  • API key file: ~/.config/supadata/.env (contains SUPADATA_API_KEY=...)
  • Base URL: https://api.supadata.ai/v1
  • Auth header: x-api-key: <key>

Before making API calls, read the key:

SUPADATA_API_KEY=$(grep SUPADATA_API_KEY ~/.config/supadata/.env | cut -d= -f2)

Use this variable in all curl commands below.


Input Parsing

Extract from user input or $ARGUMENTS:

  1. URL — The video/content URL
  2. MODE — What the user wants:
    • transcript (default) — Get transcript + metadata for a single URL
    • batch — Get transcripts for a playlist, channel, or list of URLs
    • translate — Get transcript translated to a target language (YouTube only)
    • search — Search YouTube for videos, then optionally get transcripts
    • metadata-only — Just get metadata (title, stats, author) without transcript
  3. LANGUAGE — Target language if translating (ISO 639-1 code, e.g., "es", "fr", "de")
  4. FORMATtext (plain text, default) or timestamped (chunks with timestamps)

If no URL provided, ask:

What video do you want the transcript for? Paste the URL.

Supported: YouTube, TikTok, Instagram, X/Twitter, Facebook, or any direct video/audio file URL.

If only a URL is provided, default to transcript + metadata, plain text format. Don't over-ask.


Platform Detection

Detect the platform from the URL to route correctly:

URL PatternPlatform
youtube.com, youtu.be, youtube.com/shorts/, youtube.com/live/YouTube
tiktok.com, vm.tiktok.comTikTok
instagram.com/reel/, instagram.com/p/, instagram.com/tv/Instagram
twitter.com, x.comX/Twitter
facebook.com, m.facebook.comFacebook
.mp4, .webm, .mp3, .wav, .flac, .m4a, .ogg, .mpegDirect File
youtube.com/playlist, playlist IDYouTube Playlist
youtube.com/@, youtube.com/channel/YouTube Channel
Everything else (non-video URL)Web page → use /web/scrape fallback

Single Transcript Extraction

This is the primary flow — one URL, one transcript.

Step 1: Fetch Metadata

SUPADATA_API_KEY=$(grep SUPADATA_API_KEY ~/.config/supadata/.env | cut -d= -f2)
URL="THE_VIDEO_URL"
curl -s -H "x-api-key: $SUPADATA_API_KEY" \
  "https://api.supadata.ai/v1/metadata?url=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$URL', safe=''))")"

Display a header with the metadata:

**{title}**
{author} · {platform} · {duration} · {views} views · {likes} likes
Published: {date}

Step 2: Fetch Transcript

For YouTube URLs, use the dedicated YouTube endpoint (more features):

SUPADATA_API_KEY=$(grep SUPADATA_API_KEY ~/.config/supadata/.env | cut -d= -f2)
curl -s -H "x-api-key: $SUPADATA_API_KEY" \
  "https://api.supadata.ai/v1/youtube/transcript?url=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$URL', safe=''))")&text=true"

For all other platforms (TikTok, Instagram, X, Facebook, files), use the universal endpoint:

SUPADATA_API_KEY=$(grep SUPADATA_API_KEY ~/.config/supadata/.env | cut -d= -f2)
curl -s -H "x-api-key: $SUPADATA_API_KEY" \
  "https://api.supadata.ai/v1/transcript?url=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$URL', safe=''))")&text=true"

Step 3: Handle Async Jobs

If the API returns HTTP 202 with a jobId, the transcript is being generated asynchronously (common for non-YouTube platforms and long videos):

# Poll every 5 seconds until complete
curl -s -H "x-api-key: $SUPADATA_API_KEY" \
  "https://api.supadata.ai/v1/transcript/$JOB_ID"

Poll until status is completed or failed. Max 12 attempts (60 seconds).

Step 4: Display and Save

Show the full transcript in the conversation.

Save to file:

# Filename: ~/ai-content-system/output/transcripts/{platform}-{sanitized-title}-transcript.md

The saved file should include:

# {Title}
**{Author}** · {Platform} · {Duration} · {Views} views
**URL:** {original URL}
**Extracted:** {date}

---

{full transcript text}

Timestamped Format

If user asks for timestamps, use text=false in the API call. The response returns chunks:

[{"text": "chunk text", "offset": 0, "duration": 5000, "lang": "en"}]

Format as:

[0:00] chunk text
[0:05] next chunk

Translation (YouTube Only)

When user asks to translate a transcript:

SUPADATA_API_KEY=$(grep SUPADATA_API_KEY ~/.config/supadata/.env | cut -d= -f2)
curl -s -H "x-api-key: $SUPADATA_API_KEY" \
  "https://api.supadata.ai/v1/youtube/transcript/translate?url=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$URL', safe=''))")&lang=$LANG_CODE&text=true"

Note: Translation takes 20+ seconds and costs 30 credits/minute. Warn the user about the higher cost if the video is long.

Common language codes: es (Spanish), fr (French), de (German), pt (Portuguese), ja (Japanese), ko (Korean), zh (Chinese), ar (Arabic), hi (Hindi), it (Italian).


Batch Transcripts (YouTube Only)

For playlists, channels, or multiple videos:

source ~/.config/supadata/.env
# For a playlist:
curl -s -X POST -H "x-api-key: $SUPADATA_API_KEY" -H "Content-Type: application/json" \
  -d '{"playlistId": "$PLAYLIST_URL", "text": true, "limit": 10}' \
  "https://api.supadata.ai/v1/youtube/transcript/batch"

# For a channel:
curl -s -X POST -H "x-api-key: $SUPADATA_API_KEY" -H "Content-Type: application/json" \
  -d '{"channelId": "$CHANNEL_URL", "text": true, "limit": 10}' \
  "https://api.supadata.ai/v1/youtube/transcript/batch"

# For multiple video IDs:
curl -s -X POST -H "x-api-key: $SUPADATA_API_KEY" -H "Content-Type: application/json" \
  -d '{"videoIds": ["id1", "id2", "id3"], "text": true}' \
  "https://api.supadata.ai/v1/youtube/transcript/batch"

Then poll for results:

curl -s -H "x-api-key: $SUPADATA_API_KEY" \
  "https://api.supadata.ai/v1/youtube/batch/$JOB_ID"

Save each transcript as a separate file in ~/ai-content-system/output/transcripts/batch-{topic}/.


YouTube Search → Transcript

When user wants to find and transcribe videos on a topic:

SUPADATA_API_KEY=$(grep SUPADATA_API_KEY ~/.config/supadata/.env | cut -d= -f2)
curl -s -H "x-api-key: $SUPADATA_API_KEY" \
  "https://api.supadata.ai/v1/youtube/search?query=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$QUERY', safe=''))")"

Show the results and ask which video(s) to transcribe. Then run the single transcript flow for each selected video.


Web Page Fallback

If the URL is not a video platform, fall back to web scraping:

SUPADATA_API_KEY=$(grep SUPADATA_API_KEY ~/.config/supadata/.env | cut -d= -f2)
curl -s -H "x-api-key: $SUPADATA_API_KEY" \
  "https://api.supadata.ai/v1/web/scrape?url=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$URL', safe=''))")"

Returns the page content as markdown. Useful for blog posts, articles, docs.


Error Handling

HTTP CodeMeaningAction
200SuccessDisplay transcript
202Async job startedPoll with jobId until complete
206No transcript availableTell user: "No captions available for this video. The video may be too short, have no speech, or captions are disabled."
400Bad requestCheck URL format
401Invalid API keyTell user to check ~/.config/supadata/.env
402Out of creditsTell user to check their SupaData plan
404Video not foundTell user: "Video not found — it may be private, deleted, or the URL is wrong."
429Rate limitedWait 10 seconds, retry once

Follow-Up Behavior

After showing a transcript, offer relevant next steps based on context:

  • "Summarize this" → Summarize the key points from the transcript
  • "Extract the main arguments" → Pull out structured takeaways
  • "Get another video" → Ready for the next URL
  • "Get the whole playlist" → Switch to batch mode
  • "Translate to [language]" → Use translation endpoint (YouTube only)

If the user is working on content (e.g., came from /content-master), proactively suggest how the transcript could inform their scripts.