I am pretty sure, being in the software development universe, you’ve come across this word a lot many times. Scalability. What is it? Why is it so important? Why is everyone talking about it?

It is important to scale systems? What are your plans or contingencies to scale when your app or the platform experiences significant traffic growth?

This write-up is an in-depth guide on scalability. It answers all your questions on it. What does scalability mean in context to web applications, distributed systems, cloud computing? Etc.

So, without any further ado.
Let’s get started.


1. What is Scalability?

Scalability in computing & software development means the ability to handle & withstand increased workload without sacrificing the latency.

For instance, if your app takes x seconds to respond to a user request. It should take the same x seconds to respond to each of the million concurrent user requests on your app.

The backend infrastructure of the app should not crumble under a load of a million concurrent requests. It should scale well when subjected to a heavy traffic load & should maintain the latency of the system.

What is Latency?

Latency is the amount of time a system takes to respond to a user request. Let’s say you send a request to an app to fetch an image & the system takes 2 seconds to respond to your request. The latency of the system is 2 seconds.

Minimum latency is what efficient software systems strive for. No matter how much the traffic load on a system builds up, the latency should not go up. This is what scalability is.

If the latency remains the same, we can say yeah the application scaled well with the increased load & is highly scalable.

Let’s think of scalability in terms of Big-O notation. Ideally, the complexity of a system or an algorithm should be O(1) which is constant time like in a Key-value database.

A program with the complexity of O(n^2) where n is the size of the data set is not scalable. As the size of the dataset increases the system will need more computational power to process the tasks.


2. What are the Different Ways to Scale an Application or Different Types of Scalability?

An application to scale well needs solid computing power. The servers should be powerful enough to handle increased traffic loads.

There are two ways to scale an application

  1. Vertical Scaling
  2. Horizontal Scaling


2.1 What is Vertical Scaling?

Vertical scaling means adding more power to your server. Let’s say your app is hosted by a server with 16 Gigs of RAM, to handle the increased load you increase the RAM to 32 Gigs. You have vertically scaled the server.

vertical scalability 8bitmen.com

Ideally, when the traffic starts to build upon your app the first step should be to scale vertically. Vertical scaling is also called scaling up.

Increase the power of the hardware running the app. This is the simplest way to scale since it doesn’t require any code refactoring, not making any complex configurations and stuff. I’ll talk about it ahead in the article, why code refactoring is required when we horizontally scale the app.

But there is only so much we can do when scaling vertically. There is a limit to the capacity we can augment for a single server.

A good analogy would be to think of a multi-story building we can keep adding floors to it but only upto a certain point. What if the number of people in need of a flat keeps rising? We can’t scale up the building to the moon, for obvious reasons.

Now is the time to build more buildings. This is where horizontal scalability comes in.

When the traffic is just too much to be handled by single hardware, we bring in more servers to work together.


2.2 What is Horizontal Scaling?

Horizontal scaling, also known as scaling out means adding more hardware to the existing hardware resource pool & increasing the computational power of the system as a whole.

horizontal scalability 8bitmen.com

Now the increased traffic influx can be easily dealt with the increased computational capacity & there is literally no limit to how much we can scale horizontally assuming we have infinite resources. We can keep adding servers after servers, setting up data centres after centres.

Horizontal scaling also provides us with the ability to dynamically scale in real-time as the traffic on our website increases & decreases over a period of time as opposed to vertical scaling which requires pre-planning & a stipulated time to be pulled off.

The biggest reason why cloud computing got so popular in the industry is the ability to scale up & down dynamically. The ability to use & pay only for the resources required by the website got quite popular.

If the site has a heavy traffic influx more server nodes get added & when it doesn’t the dynamically added nodes are removed. This approach saves businesses bags of money every single day.

The approach is also known as cloud elasticity. Stretching & getting back to original the infrastructural computational capacity.

Having multiple server nodes on the backend also helps with the website staying alive online all the time even if a few server nodes crash & go down. This is known as high availability, to read more on it here you go.


2.3 Pros & Cons of Vertical & Horizontal Scaling

This is the part where I talk about the plus & minuses of both the approaches.

Vertical scaling for obvious reasons is simpler since we do not have to touch the code or make any complex distributed system configurations. The administrative, monitoring, management efforts are quite less as opposed to when managing a distributed environment.

A major downside is availability risk. The servers are powerful but few in number, there is always a risk of them going down & the entire website going offline.

What about the code? Why does the code need to be different when running on multiple machines?

If you need to run the code in a distributed environment, it needs to be stateless. There should be no state in the code. What do I mean by that?

No static instances in the class. Static instances hold application data & if a particular server goes down all the static data/state is lost. The app is left in an inconsistent state.

Rather, use a persistent memory like a Key value store to hold data, remove all the state/static variable from the class. This is why functional programming got so popular with distributed systems. Well, this needs another dedicated write-up in itself.

Always have a ballpark estimate on mind when designing your app. How much traffic it has to deal with?

Development teams today are adopting a distributed microservices architecture right from the start & the workloads are meant to be deployed on the cloud. So, inherently the workloads are horizontally scaled out on the fly.


8bitmen.com Microservices architecture


The upsides of horizontally scaling include no limit to augmenting the hardware capacity, data is replicated across different geographical regions as nodes & data centres are setup across the globe.


2.4 Which Scalability Approach Is Right for Your App?

If your app is a utility or a tool which is expected to receive bare minimum traffic & is not business critical. It may be internal to an organization or something like that.

Why bother hosting it in a distributed environment? A single server is enough to manage the traffic, go ahead with vertical scaling in case the traffic load increases by a not so significant number.

If your app is a public facing social app like a social network, a fitness app or something. Build to deploy it on the cloud & always have horizontal scaling in mind. Period.


3. Primary Bottlenecks That Hurt the Scalability of Our Web Application?

There are several points in a web application which can become a bottleneck & can hurt the scalability of our application. Let’s have a look at them.


Layers of a web application - 8bitmen.com



Our application appears to be architected well. Everything looks good. The workload runs on multiple nodes, has to ability to horizontally scale. But the database is a poor single monolith, just one server been given the onus of handling the data requests from all the server nodes of the workload.

This is a bottleneck, the server nodes would work well, handle millions of requests at a point in time efficiently but still the response time, the latency of the application would be very high due to a single database. There is only so much it can handle.

Just like the workload scalability, the database needs to be scaled well. Make wise use of database partitioning, sharding, use multiple database servers to make the module efficient.


Application Architecture

A poorly designed application’s architecture can become a major bottleneck as a whole. Some of the common architectural mistakes are:

Not Using Asynchronous Processes, Modules Where Ever Required

Rather all the processes are scheduled sequentially. For instance, if a user uploads a document on the portal, tasks such as sending a confirmation email to the user, sending a notification to all of the subscribers/listeners to the upload event should be done asynchronously.

These tasks should be forwarded to a messaging server as opposed to doing it all sequentially & making the user wait for everything.


Not Using Caching In the Application Wisely

Caching can be deployed at several layers of the application & it speeds up the response time by notches. It intercepts all the requests going to the database, reducing the overall load on it.

Use caching exhaustively throughout the application to speed up things significantly.


Inefficient Configuration & Setup of Load Balancers

Load balancers are the gateway to our application. Using too many or too few of them impacts the latency of our application.


Adding Business Logic to the Database

No matter what justification the developers provide, I’ve never been a fan of adding business logic to the database.

The database is just not the place to put business logic. Not only it makes the whole application tightly coupled. It puts unnecessary load on it.

Imagine when migrating to a different database tech, how much code refactoring it would require.


Not Picking the Right Database Tech

Picking the right database technology is vital for businesses. Need transactions & strong consistency pick SQL. Can do without strong consistency, need horizontal scalability on the fly pick NoSQL.

Trying to pull off stuff with a not so suitable tech always has a profound impact on the latency of the entire application in negative ways.


At the Code Level

This shouldn’t come as a surprise but inefficient & badly written code has the potential to take down the entire service in production.

Using unnecessary loops, nested loops. Writing tightly coupled code. Not paying attention to the Big-O complexity while writing the code. Be ready to do a lot of firefighting in production.


4. How to Improve the Scalability of a Web Application?

First thing, do thorough testing of the application. I’ll talk about that in the section down below.

Profile the hell out. Run application profiler, code profiler. See which processes are taking too long, eating too many resources. Find out the bottlenecks. Get rid of them.

Cache wisely. Cache everywhere. Cache all the static content. Hit the database only when it is really required. Try to serve all the read requests from the cache. Use a write-through cache.

Use a CDN. Using a CDN further reduces the latency of the application due to the proximity of the data from the requesting user.

Compress data. Use apt. compression algorithms to compress data. Store data in the compressed form. Compressed data consumes less bandwidth. Less the bandwidth, faster the download of the data on the client.

Avoid unnecessary round trips between the client & server. Try to club multiple requests into one.

These are a few of the things we should keep in mind in context to the performance of the application.

Now you must be thinking why am I talking about the performance when I should be talking about the scalability?

Well, application performance is directly proportional to the scalability. If an application is not performant it will definitely not scale well.

Once we are done with the basic performance testing of the application. Time for capacity planning, provisioning the right amount of hardware, computing power.


5. How to Test for Scalability in Software Systems?

As per the anticipated traffic, appropriate hardware, computational power is provisioned to handle it smoothly with some buffer.

Several load & stress tests should be run on the application. Use tools like JMeter etc. to run concurrent users test on the application.

See how is the provisioned hardware doing when subjected to heavy traffic. There are a lot of cloud-based testing tools which help us simulate tests scenarios with a few mouse clicks.

How Hotstar, a video streaming service, scaled with over 10 million concurrent users. This is a good read.


More On the Blog

What Database Does Facebook Use? – A 1000 Feet Deep Dive

Designing a video search service on AWS – AWS Cloud Computing Architecture

Pick the Right Technology Stack Series – All Articles

What is Liquid Software Development? My Take on It

What Does Software Metric Mean In Software Engineering & Live Production Systems?


Well, Guys!! This is pretty much it on the scalability of the software systems. If you liked the article, do share it with your folks.
Follow 8bitmen on social media.
You can subscribe to the browser notifications to stay notified on the new content on the blog.

I’ll see you in the next article.
Until then…