Endpoints
POST /api/v1/transcribe— Audio or video URL → word-level transcriptPOST /api/v1/captions/render— Video URL + transcript → video with burned-in captionsGET /api/v1/resources/caption-styles— List all 63 caption presets
Quickstart — transcribe
curl -X POST https://api.reelsbuilder.ai/api/v1/transcribe \
-H "Authorization: Bearer $REELSBUILDER_API_KEY" \
-H "Idempotency-Key: $(uuidgen)" \
-H "Content-Type: application/json" \
-d '{
"media_url": "https://your-cdn.example.com/podcast.mp3",
"language_hint": "en",
"diarization": true,
"output_formats": ["json", "srt", "vtt"]
}'Initial response
{
"success": true,
"data": {
"job_id": "tjob_01HKZ...",
"status": "queued",
"estimated_completion_sec": 35,
"source": {
"duration_sec": 1842,
"format": "audio/mp3"
}
},
"meta": { "request_id": "req_...", "credits_used": 7, "credits_remaining": 993 }
}Completion webhook
POST https://your-app.example.com/webhooks/transcript
X-RB-Event-Type: transcript.completed
{
"event_id": "evt_...",
"event_type": "transcript.completed",
"data": {
"job_id": "tjob_01HKZ...",
"status": "completed",
"transcript": {
"text": "So the way we ship 10x faster is by writing the documentation first...",
"language": "en",
"confidence": 0.97,
"duration_sec": 1842,
"word_count": 4218,
"words": [
{ "text": "So", "start": 0.12, "end": 0.31, "confidence": 0.99, "speaker": "spk_1" },
{ "text": "the", "start": 0.31, "end": 0.42, "confidence": 0.99, "speaker": "spk_1" },
{ "text": "way", "start": 0.42, "end": 0.58, "confidence": 0.98, "speaker": "spk_1" }
],
"speakers": [
{ "id": "spk_1", "estimated_minutes_active": 22.4 },
{ "id": "spk_2", "estimated_minutes_active": 8.1 }
]
},
"output_files": {
"json": "https://cdn.reelsbuilder.ai/t/tjob_.../transcript.json",
"srt": "https://cdn.reelsbuilder.ai/t/tjob_.../transcript.srt",
"vtt": "https://cdn.reelsbuilder.ai/t/tjob_.../transcript.vtt"
}
}
}Parameters
| Field | Type | Required | Description |
|---|---|---|---|
media_url | URL | yes | Public HTTPS URL to audio (mp3, wav, m4a, flac, ogg) or video (mp4, mov, webm). Max 4 hours, 500MB. |
language_hint | string | no | ISO 639-1 code. If omitted, language is auto-detected. |
diarization | boolean | no | Tag each word with a speaker ID. Default false. |
output_formats | string[] | no | Subset of json, srt, vtt, ass, txt. Default ["json"]. |
caption_style | string | no | If provided alongside ass in output_formats, generates an ASS file styled with one of the 63 caption presets. |
filter_profanity | boolean | no | Replace profanity with asterisks. Default false. |
webhook_url | URL | recommended | HTTPS URL for completion callback. |
Render captions into a video
Once you have a transcript, render burned-in karaoke captions onto the source video:
curl -X POST https://api.reelsbuilder.ai/api/v1/captions/render \
-H "Authorization: Bearer $REELSBUILDER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"video_url": "https://your-cdn.example.com/source.mp4",
"transcript_job_id": "tjob_01HKZ...",
"caption_style": "neon_outline_yellow",
"position": "lower_third",
"max_words_per_line": 4,
"highlight_active_word": true
}'Caption style catalog
63 presets covering 5 categories. The full list is queryable at GET /api/v1/resources/caption-styles. Notable presets:
- Default styles —
default_white,default_yellow,default_black_box - Neon —
neon_outline_pink,neon_outline_yellow,neon_glow_cyan - Bold —
impact_yellow,impact_white_outline,shadowed_red - Karaoke-highlighted —
karaoke_word_bounce,karaoke_word_color_swap,karaoke_progressive_fill - Branded — Auto-derived from a brand_id; uses the brand's primary + accent colors and font.
Supported languages
Powered by ElevenLabs Scribe v2 — 99 languages with word-level timestamps. Highest-accuracy tiers: English, Spanish, French, German, Portuguese, Italian, Dutch, Polish, Russian, Japanese, Korean, Chinese (Mandarin), Arabic, Hindi, Turkish.
Full list at GET /api/v1/resources/transcribe-languages.
Examples
TypeScript — transcribe + render captions
const auth = { Authorization: `Bearer ${process.env.REELSBUILDER_API_KEY}` };
// 1. Kick off transcription
const r1 = await fetch("https://api.reelsbuilder.ai/api/v1/transcribe", {
method: "POST",
headers: { ...auth, "Content-Type": "application/json", "Idempotency-Key": crypto.randomUUID() },
body: JSON.stringify({
media_url: "https://your-cdn.example.com/source.mp4",
language_hint: "en",
output_formats: ["json"],
}),
});
const transcribeJob = (await r1.json()).data;
// 2. Wait for it (or use webhook). Once complete...
// 3. Render captions
const r2 = await fetch("https://api.reelsbuilder.ai/api/v1/captions/render", {
method: "POST",
headers: { ...auth, "Content-Type": "application/json", "Idempotency-Key": crypto.randomUUID() },
body: JSON.stringify({
video_url: "https://your-cdn.example.com/source.mp4",
transcript_job_id: transcribeJob.job_id,
caption_style: "karaoke_word_bounce",
position: "lower_third",
}),
});
const renderJob = (await r2.json()).data;
console.log(`Render job: ${renderJob.job_id}`);Pricing
- Transcribe: 1 credit per 4 minutes of audio (rounded up). 1-hour podcast = 15 credits.
- Diarization: +50% on transcribe cost. 1-hour podcast with diarization = 23 credits.
- Caption rendering: 5 credits per minute of output video.
Latency
- Transcribe p50: ~3 seconds per minute of source audio (so a 10-min clip transcribes in ~30s)
- Caption render p50: ~5 seconds per minute of source video
- Max source length: 4 hours
See also
- YouTube Clipper — includes transcription + captions built-in
- Video Generation — all generated videos include captions by default
- Webhooks — receive transcription completion