How HotStar Scaled With 10.3 Million Concurrent Users – An Architectural Insight
Hotstar is the leading streaming media & video on demand service in India with a user base of approx. 200 million users.
It lets users stream popular shows in several different languages & genres over the web. But the primary, the most popular feature of the service is the streaming of live cricket matches. This is the real deal.
In the latest edition of IPL T20 Cricket tournament, the streaming platform received whooping record traffic of 10.3 Million concurrent users smashing its previous record of 8.26 million concurrent users.
To my knowledge this is the most intense concurrent traffic load on any service, surpassing Fortnite which clocked a load of 8.3 million concurrent players in recent past.
1. Technical Insights
All the traffic is served by the EC2 instances & S3 Object store is used as the data store.
The services use a mix of on-demand & spot instances to keep the costs optimized.
Running machine learning & data analytics algorithms on spot instances helps the business cut down costs by quite an extent.
Terabytes of data in double digits is generated in any regular day and is processed by the AWS EMR Clusters.
AWS EMR is a managed Hadoop framework for processing massive amounts of data across the EC2 instances. Other popular frameworks like Apache Spark, Presto, HBase etc. can also be used with AWS EMR.
2. Infrastructure Setup for Load Testing the Platform for the Big Event
Speaking of the infrastructure setup for the load testing of the platform. It is 500+ AWS CPU instances which are C4.4X Large or C4.8X Large running at 75% utilization.
C4X instances are built for very high CPU intensive operations with low price per compute ratio. They provide high networking performance, increased storage performance at no additional cost.
C4.4X instances typically have 30 Gigs of RAM & C4.8X 60 Gigs of RAM
The entire setup has 16 TBs of RAM, 8000 CPU cores & at peak the data transfer was approx. 32Gbps.
The engineering team makes intelligent use of spot & on-demand instances to keep the costs low.
3. Simulating Traffic
The traffic simulation had three major components
The traffic model, the simulation script & the load generation infrastructure. I’ve already talked about the infrastructure part.
3.1 Preparing the Traffic Model
The engineering team started with a very basic traffic model, hitting the API endpoints with a certain ratio of requests. Performed some initial runs.
There are typically two types of user interactions with the system; Users who have already signed up & are logging in & the users visiting the platform for the first time.
Keeping all the different user navigation scenarios in mind the team prepared a traffic model of the system.
Now the time was to introduce spikes in the model over the entire testing phase.
3.2 Writing the Simulation Script
A few tools that were used to write simulation are Gatling, Flood.io;
Gatling is an open-source load, performance testing tool for web applications built on Scala, Akka & Netty.
It was preferred over JMeter, which is a pretty popular testing tool, due to its capability of simulating more concurrent users at one point in time.
Gatling sessions were used to maintain the user state. Simulation scripts were designed for several workflows throughout the system, for instance, user login flow, streaming flow etc.
The simulation ran on an AWS distributed cluster & simulated 20 million users out of which 4 million were concurrent.
Another cloud-based testing tool Flood.io was used with Gatling for load testing.
Flood.io is a load testing platform that runs distributed performance tests with open source tools like JMeter, Gatling, Selenium etc.
All this test setup matured with the data obtained as the real cricket matches happened.
4. Strategies to Scale
4.1 Traffic & Ladder Based Scaling
There are primarily two triggers to scale; Traffic-based & Ladder based
Traffic based scaling simply meant add new infrastructure to the pool as the number of requests being processed by the system increases.
This works with unpredictable workloads & for services which provide all the stats in detail. They expose the number of threads created; requests processed etc.
Ladder based scaling is preferred for services which do not give much detail on the processes. For tackle this, the team has pre-defined ladders per million concurrent users.
As the number adds on, new infrastructure is added to the pool. This technique works well for predictable load.
The team has a concurrency buffer of 2 million concurrent users in place as adding new infrastructure to the pool takes around 90 seconds & the container and the application start takes around 75 seconds.
To beat this delay, the team has a pre-provisioned buffer. This buffer approach is preferred in contrast to auto-scaling. It helps the team handle unexpected traffic spikes without adding unnecessary latency to the response.
They have developed an internal app called the Infradashboard which helps them take these scaling decisions smoothly & quite ahead of time.
4.2 Intelligent Client
Protocols are in place which come into effect when the backend is overwhelmed with requests & the application client experiences increased latency in response.
Client avoids burdening the backend servers even more in these kinds of scenarios by increasing the time between subsequent requests.
Caching & intelligent protocols are in place to enhance the user experience.
4.3 Dealing with Nefarious Traffic
Popularity attracts unwanted attention. The application with a combination of whitelisting and industry best practices drives away nefarious traffic as soon as it is discovered avoiding the burden on the servers.
4.4 Do Not Do Real-time If It’s Not Business Critical
Recently Hotstar introduced a play along feature, where the users can interact with their folks, play trivia & stuff on the same platform watching cricket. This feature being real-time spiked the concurrent traffic load on their backend servers.
Doing stuff in real-time has its costs associated. Real-time features spike the concurrent traffic by quite an extent. And to manage concurrent traffic we need powerful hardware which can be avoided if the feature isn’t real-time.
All these experiences are pretty insightful & help us plan, design & execute in a better & informed way.
If you are intrigued, & want to read more about HotStar’s platform engineering. Here you go.
More On the Blog
Well, Guys!! This is pretty much it. If you liked the write-up do share it with your folks.
Don’t forget to let me know your thoughts in the comments.
You can follow 8bitmen.com on social media, also subscribe to the browser notifications to be notified of new content published.
I’ll see you in the next write-up.
- A Thorough Guide to High Availability, HA Cluster Architecture & Fault Tolerance
- A Super Helpful Guide to Understanding Workload & It’s Types in Cloud
- What is Autoscaling? How Does It Work In the Cloud – Simply Explained
- Why Use Cloud? How Is Cloud Computing Different from Traditional Computing?
- Instagram Architecture – How Does It Store & Search Billions of Images