Design a video streaming platform like YouTube or Netflix

Question

Accepted Answer

A video streaming platform like YouTube or Netflix must ingest, transcode, store, and deliver video at planetary scale — petabytes of content, billions of daily views, and a user expectation of near-instant playback at the highest quality their network can sustain. The workload is overwhelmingly read-heavy (roughly 99% reads), and the two hardest engineering problems are the upload and transcoding pipeline at write time and the adaptive delivery system at view time. Scale framing: YouTube alone sees over 500 hours of video uploaded every minute. A single 4K upload may produce a dozen output renditions. At view time, a trending video can receive millions of concurrent segment requests within seconds of publication. The system must handle both the bursty write path and the massive, sustained read path without coupling them. Regional ingest endpoint — the uploader connects to the nearest ingest point of presence. The raw file is streamed directly to object storage ( S3 or GCS ) using multipart upload. A lightweight upload service records job metadata and emits an event to a queue ( Kafka or SQS ) to kick off transcoding asynchronously. This decouples the upload from the slow transcoding work. Distributed transcoding pipeline — a fleet of transcoding workers consumes the queue. Each worker pulls the raw file, splits it into short segments (typically 2–10 seconds), and fans those segments out to a GPU cluster running FFmpeg or a managed service like AWS Elastic Transcoder. The pipeline produces every target rendition: 240p, 360p, 480p, 720p, 1080p, 4K , encoded in H.264 , VP9 , and AV1 for different client capabilities. Each rendition is segmented into .ts or .m4s files and a playlist manifest ( HLS or DASH ) is generated to describe them. Storage tiers — finished segments are pushed to object storage, then tiered by access frequency: hot content lives on CDN edge nodes (SSD-backed), warm content on standard S3 , and cold long-tail content on S3 Glacier or equivalent. Metadata (title, owner, duration, chapter markers, thumbnails, segment URLs) is stored in a relational database sharded by video_id . Multi-tier CDN delivery — over 95% of bytes are served from CDN edge nodes rather than origin. Netflix runs its own purpose-built CDN called Open Connect , deployed inside ISP networks to eliminate transit costs. General platforms use CloudFront, Akamai, or Fastly. Popular content is pre-warmed to edges at publication time; long-tail content is fetched on demand and cached on first request. Adaptive Bitrate Streaming (ABR) — when a viewer presses play, the client fetches the HLS or DASH manifest, then downloads segments one at a time. After each segment it measures throughput and adjusts the quality level for the next segment. A congested connection steps down to 480p; a fast connection steps up to 4K. This prevents buffering while maximizing perceived quality. Recommendations, search, and metadata services — a separate analytics stack ingests view events into a data lake ( BigQuery , Spark ), runs collaborative-filtering and deep-learning models offline, and serves recommendations via a low-latency feature store. Full-text search over titles and descriptions uses Elasticsearch. Comments, likes, and watch-history live in their own microservices with NoSQL stores. DRM and content protection — premium content is encrypted at segment level. Players fetch decryption keys from a license server ( Widevine for Android/Chrome, FairPlay for iOS/Safari, PlayReady for Windows). The license server enforces entitlement checks before releasing keys. The manifest references encrypted segments but keys are never bundled with the video file. A viewer's complete playback journey: the browser requests the manifest URL from the CDN edge. The edge serves it from cache (or fetches from origin). The client parses the manifest, selects an initial quality tier based on a bandwidth probe, and starts requesting 4-second segments sequentially. The player buffers a few segments ahead and continuously recalculates bandwidth. The CDN edge serves each segment — most from cache, a cache miss triggers an origin fetch and the segment is cached for the next viewer. Playback begins within one to two seconds even on constrained connections. Key trade-off — pre-transcoding vs on-demand transcoding: Pre-transcoding all resolutions at upload time consumes substantial compute and storage but guarantees zero added latency at view time and makes CDN caching trivial (immutable segment files). On-demand transcoding (just-in-time rendering at request time) reduces storage costs and compute waste for content that is never watched, but adds latency to the first viewer and dramatically complicates CDN caching. YouTube and Netflix both pre-transcode; niche platforms with millions of rarely-viewed uploads must weigh storage cost against viewer experience. Additional concerns worth addressing: abuse detection runs as a post-upload pipeline using Content ID fingerprinti