AI Short Studio — AI Shorts from Long‑Form Video

11/4/2025

AI Short Studio — AI Shorts from Long‑Form Video

Overview

AI Short Studio turns long‑form videos into shorts suitable for Reels/TikTok/Shorts. Users paste a YouTube link or upload a file, pick preferences, and the system plans, renders, and delivers a set of vertical clips with captions.

AI Short Studio — Landing

Goals & Outcomes

  • Lower the friction of repurposing long‑form content into platform‑ready shorts
  • Provide a robust pipeline that survives edge cases (noisy audio, missing audio tracks, short slices)
  • Keep users informed in real time during multi‑step rendering

Architecture

  • Apps: landing/ (marketing), client/ (app), server/ (API + workers)
  • Queue: BullMQ + Redis for long‑running jobs
  • Billing: Stripe for credits and payments
  • Realtime: Server‑Sent Events (SSE) channel per project
[landing]  → Vite static site (SEO: robots + sitemap)
[client]   → React app (auth, projects, buy credits, project detail)
[server]   → Express API (auth, billing, projects, sse) + Worker (BullMQ queue)
              ├─ services/videoProcessing (ffmpeg, POI tracking, captions)
              ├─ services/gptService (clip planning prompt)
              ├─ services/openaiClient (OpenAI SDK wrapper)
              ├─ services/queue (BullMQ)
              └─ routes/{auth,billing,projects,sse}

Key Features

  • AI planning: selects compelling moments using OpenAI based on transcript, scenes, and visuals
  • Smart cropping: point‑of‑interest tracking to keep subject centered in 9:16
  • Rendering: ffmpeg‑based pipeline with per‑slice zoom and optional transitions
  • Captions: OpenAI Whisper or whisper.cpp; burn‑in with custom ASS styling and font
  • Robustness: fallbacks for missing audio and slice validation

Branding notes (public):

  • Identity: bold, focused logotype; high‑contrast layout
  • Color: energetic accent with a neutral base palette (action surfaces only)
  • Typeface: modern sans for UI; display font for hero and highlights

Dashboard & project flow

Implementation

  • Frontend: React + Vite + Tailwind (landing + client)
  • API: Node.js + Express + MongoDB (Mongoose)
  • Queue: BullMQ + ioredis; workers render video segments and assemble output
  • Payments: @stripe/react-stripe-js + server webhooks
  • Realtime progress: SSE endpoint per project with Redis pub/sub
client/: React app with Auth, Projects, New Project, Buy Credits, Account
landing/: Marketing site with sitemap/robots
server/: Express API, routes: admin, auth, billing, projects, sse

Payments & credits (Stripe)

Credits are purchased via Stripe Payment Intents; webhooks increment balances idempotently using stored events.

// server/src/routes/billing.js (excerpt)
router.post('/stripe/webhook', express.raw({ type: 'application/json' }), async (req, res) => {
  const sig = req.headers['stripe-signature'];
  const event = stripe.webhooks.constructEvent(req.body, sig, process.env.STRIPE_WEBHOOK_SECRET || '');
  await WebhookEvent.create({ type: event.type, payload: event, signature: sig });
  if (event.type === 'payment_intent.succeeded') {
    const pi = event.data.object; const credits = Number(pi.metadata?.credits || 0);
    const userId = pi.metadata?.userId; if (userId && credits > 0) {
      await User.findByIdAndUpdate(userId, { $inc: { credits } });
      await CreditTransaction.create({ userId, amountCredits: credits, usd: pi.amount_received/100, packageKey: pi.metadata?.packageKey, paymentIntentId: pi.id, status: 'succeeded' });
    }
  }
  res.json({ received: true });
});

Advanced technique — smart 9:16 crop with POI smoothing

The renderer samples frames, detects faces/POI (face‑api/coco‑ssd), smooths centers (EMA), then computes per‑slice crop + zoom with time‑varying ffmpeg expressions.

// server/src/services/videoProcessing.js (excerpt)
const alpha = 0.7; // smoothing
for (let i = 0; i < centers.length; i++) {
  if (!centers[i]) { centers[i] = last; continue; }
  if (!last) { last = centers[i]; continue; }
  last = { cx: alpha * centers[i].cx + (1 - alpha) * last.cx, cy: alpha * centers[i].cy + (1 - alpha) * last.cy };
  centers[i] = last;
}
// ffmpeg crop/scale expressions (per slice)
const zoomExpr = havePrev ? `${prevZoom}+(${zoomFactor}-${prevZoom})*${E}` : `${zoomFactor}`;
const cxExpr = havePrev ? `${prevCx}+(${avgCx}-${prevCx})*${E}` : `${avgCx}`;
const cyExpr = havePrev ? `${prevCy}+(${avgCy}-${prevCy})*${E}` : `${avgCy}`;

Advanced technique — word‑level captions via whisper.cpp

When enabled, the pipeline extracts audio from the final short, transcribes with whisper.cpp (or Whisper fallback), builds an ASS file with adjustable casing/size/position, and burns captions into the video.

// server/src/services/videoProcessing.js (excerpt)
const wordsRaw = await transcribeWordsWithWhisperCpp(shortAudio, options?.language || null);
await fsp.writeFile(assPath, buildAssFromTranscript(wordsRaw, caps), 'utf8');
await run('ffmpeg', ['-y', '-i', outPath, '-vf', subFilter, '-c:a', 'copy', '-c:v', 'libx264', '-preset', 'medium', '-crf', '18', '-movflags', '+faststart', captionedOut]);

Realtime progress updates (SSE)

Each project exposes an SSE stream. The server subscribes to Redis pub/sub and pushes status updates to connected clients.

// server/src/routes/sse.js (excerpt)
sseRouter.get('/projects/:id/events', authMiddleware, async (req, res) => {
  // ... set headers ...
  res.write(`data: ${JSON.stringify({ step: 'connected', percent: project.progress, message: 'connected' })}\n\n`);
});

Performance & Resilience

  • Validates every segment (duration, streams) and repairs audio if missing
  • Times out long ffmpeg tasks proportionally to media length
  • Uses fallback blur‑background composition if POI detection fails

Further techniques:

  • Best‑audio‑stream probing (channels/bitrate) before extraction
  • Silence detection to delay first captions (avoid early non‑speech)
  • Per‑slice validation with re‑encode fallback; forced minimum slice durations
  • Transition combiners with acrossfade and xfade handling A/V sync

SEO & Compliance

  • robots.txt and sitemap.xml shipped on both landing and app
  • Legal pages (Privacy, Terms, Refunds) implemented in the client

SEO implementation:

  • Landing sitemaps + robots generated in repo, canonical URLs and OpenGraph/Twitter metadata on pages
  • App routes with descriptive titles and meta where applicable; consistent favicon, social image
  • Cookie consent banner updates analytics consent via gtag('consent','update', …) and DataLayer events

What made it work

  • Tight loop between AI planning and visual validation (scene cuts + transcript + vision)
  • Smooth, resilient rendering with slice validation and audio fixes
  • Clear UX: credits, progress, and results download in one place

Deliverables

  • Product architecture (landing + app + API + worker)
  • Credits & billing (Stripe)
  • Project workflow UI with realtime progress (SSE)
  • AI clip planning (OpenAI)
  • Video processing pipeline (FFmpeg, smart POI crop, transitions)
  • Auto captions (Whisper/whisper.cpp) with styled burn‑in

Links