AI Short Studio — AI Shorts from Long‑Form Video
11/4/2025

Overview
AI Short Studio turns long‑form videos into shorts suitable for Reels/TikTok/Shorts. Users paste a YouTube link or upload a file, pick preferences, and the system plans, renders, and delivers a set of vertical clips with captions.

Goals & Outcomes
- Lower the friction of repurposing long‑form content into platform‑ready shorts
- Provide a robust pipeline that survives edge cases (noisy audio, missing audio tracks, short slices)
- Keep users informed in real time during multi‑step rendering
Architecture
- Apps:
landing/(marketing),client/(app),server/(API + workers) - Queue: BullMQ + Redis for long‑running jobs
- Billing: Stripe for credits and payments
- Realtime: Server‑Sent Events (SSE) channel per project
[landing] → Vite static site (SEO: robots + sitemap)
[client] → React app (auth, projects, buy credits, project detail)
[server] → Express API (auth, billing, projects, sse) + Worker (BullMQ queue)
├─ services/videoProcessing (ffmpeg, POI tracking, captions)
├─ services/gptService (clip planning prompt)
├─ services/openaiClient (OpenAI SDK wrapper)
├─ services/queue (BullMQ)
└─ routes/{auth,billing,projects,sse}
Key Features
- AI planning: selects compelling moments using OpenAI based on transcript, scenes, and visuals
- Smart cropping: point‑of‑interest tracking to keep subject centered in 9:16
- Rendering: ffmpeg‑based pipeline with per‑slice zoom and optional transitions
- Captions: OpenAI Whisper or whisper.cpp; burn‑in with custom ASS styling and font
- Robustness: fallbacks for missing audio and slice validation
Branding notes (public):
- Identity: bold, focused logotype; high‑contrast layout
- Color: energetic accent with a neutral base palette (action surfaces only)
- Typeface: modern sans for UI; display font for hero and highlights

Implementation
- Frontend: React + Vite + Tailwind (landing + client)
- API: Node.js + Express + MongoDB (Mongoose)
- Queue: BullMQ + ioredis; workers render video segments and assemble output
- Payments:
@stripe/react-stripe-js+ server webhooks - Realtime progress: SSE endpoint per project with Redis pub/sub
client/: React app with Auth, Projects, New Project, Buy Credits, Account
landing/: Marketing site with sitemap/robots
server/: Express API, routes: admin, auth, billing, projects, sse
Payments & credits (Stripe)
Credits are purchased via Stripe Payment Intents; webhooks increment balances idempotently using stored events.
// server/src/routes/billing.js (excerpt)
router.post('/stripe/webhook', express.raw({ type: 'application/json' }), async (req, res) => {
const sig = req.headers['stripe-signature'];
const event = stripe.webhooks.constructEvent(req.body, sig, process.env.STRIPE_WEBHOOK_SECRET || '');
await WebhookEvent.create({ type: event.type, payload: event, signature: sig });
if (event.type === 'payment_intent.succeeded') {
const pi = event.data.object; const credits = Number(pi.metadata?.credits || 0);
const userId = pi.metadata?.userId; if (userId && credits > 0) {
await User.findByIdAndUpdate(userId, { $inc: { credits } });
await CreditTransaction.create({ userId, amountCredits: credits, usd: pi.amount_received/100, packageKey: pi.metadata?.packageKey, paymentIntentId: pi.id, status: 'succeeded' });
}
}
res.json({ received: true });
});
Advanced technique — smart 9:16 crop with POI smoothing
The renderer samples frames, detects faces/POI (face‑api/coco‑ssd), smooths centers (EMA), then computes per‑slice crop + zoom with time‑varying ffmpeg expressions.
// server/src/services/videoProcessing.js (excerpt)
const alpha = 0.7; // smoothing
for (let i = 0; i < centers.length; i++) {
if (!centers[i]) { centers[i] = last; continue; }
if (!last) { last = centers[i]; continue; }
last = { cx: alpha * centers[i].cx + (1 - alpha) * last.cx, cy: alpha * centers[i].cy + (1 - alpha) * last.cy };
centers[i] = last;
}
// ffmpeg crop/scale expressions (per slice)
const zoomExpr = havePrev ? `${prevZoom}+(${zoomFactor}-${prevZoom})*${E}` : `${zoomFactor}`;
const cxExpr = havePrev ? `${prevCx}+(${avgCx}-${prevCx})*${E}` : `${avgCx}`;
const cyExpr = havePrev ? `${prevCy}+(${avgCy}-${prevCy})*${E}` : `${avgCy}`;
Advanced technique — word‑level captions via whisper.cpp
When enabled, the pipeline extracts audio from the final short, transcribes with whisper.cpp (or Whisper fallback), builds an ASS file with adjustable casing/size/position, and burns captions into the video.
// server/src/services/videoProcessing.js (excerpt)
const wordsRaw = await transcribeWordsWithWhisperCpp(shortAudio, options?.language || null);
await fsp.writeFile(assPath, buildAssFromTranscript(wordsRaw, caps), 'utf8');
await run('ffmpeg', ['-y', '-i', outPath, '-vf', subFilter, '-c:a', 'copy', '-c:v', 'libx264', '-preset', 'medium', '-crf', '18', '-movflags', '+faststart', captionedOut]);
Realtime progress updates (SSE)
Each project exposes an SSE stream. The server subscribes to Redis pub/sub and pushes status updates to connected clients.
// server/src/routes/sse.js (excerpt)
sseRouter.get('/projects/:id/events', authMiddleware, async (req, res) => {
// ... set headers ...
res.write(`data: ${JSON.stringify({ step: 'connected', percent: project.progress, message: 'connected' })}\n\n`);
});
Performance & Resilience
- Validates every segment (duration, streams) and repairs audio if missing
- Times out long ffmpeg tasks proportionally to media length
- Uses fallback blur‑background composition if POI detection fails
Further techniques:
- Best‑audio‑stream probing (channels/bitrate) before extraction
- Silence detection to delay first captions (avoid early non‑speech)
- Per‑slice validation with re‑encode fallback; forced minimum slice durations
- Transition combiners with acrossfade and xfade handling A/V sync
SEO & Compliance
robots.txtandsitemap.xmlshipped on both landing and app- Legal pages (Privacy, Terms, Refunds) implemented in the client
SEO implementation:
- Landing sitemaps + robots generated in repo, canonical URLs and OpenGraph/Twitter metadata on pages
- App routes with descriptive titles and meta where applicable; consistent favicon, social image
- Cookie consent banner updates analytics consent via
gtag('consent','update', …)and DataLayer events
What made it work
- Tight loop between AI planning and visual validation (scene cuts + transcript + vision)
- Smooth, resilient rendering with slice validation and audio fixes
- Clear UX: credits, progress, and results download in one place
Deliverables
- Product architecture (landing + app + API + worker)
- Credits & billing (Stripe)
- Project workflow UI with realtime progress (SSE)
- AI clip planning (OpenAI)
- Video processing pipeline (FFmpeg, smart POI crop, transitions)
- Auto captions (Whisper/whisper.cpp) with styled burn‑in