How Does PayPal Process Billions of Messages Per Day with Reactive Streams?
Chances are you have already come across this phrase “Data is the new Oil”
A business’s worth today is gauged by the amount of data it contains pertaining to a certain niche or market segment. In this social media era petabytes of data generation on online business platforms, every single day is pretty commonplace.
It’s vital for organizations to have intelligent systems in place to assist them in having an understanding of massive amounts of data to make better data-driven decisions, make their platforms more engaging & to continually evolve their product.
PayPal’s product performance tracking platform processes approx. 19 billion messages per day. The team leverages an Akka based reactive framework called Squbs to handle such massive amount of data, events, concurrently with minimum latency & resource consumption.
LinkedIn too uses Play framework which uses Akka based actor model & a reactive event-driven architecture to identify its users online, also for its messaging platform.
The engineering team moved to Squbs from an existing custom written Spring-based module.
Why Akka? Why a Reactive Stream Event-Driven Framework?
Why did the engineering team at PayPal migrate to a reactive streaming framework, what were the issues or the limitations with the existing Spring framework?
Reactive frameworks like Akka have an underlying architectural design quite different from the traditional servlet-based multi-threaded frameworks.
A typical use case of reactive frameworks is when we need persistent connections between the client & the server. We ideally use long polling or web sockets to establish persistent connections.
Akka is a non-blocking reactive streaming framework built to write modern web apps. Ideal for managing instances such as handling a large number of events, streaming data, concurrent real-time data, asynchronous data exchange, data ingestion etc.
The framework consumes minimum CPU & other resources in highly distributed scalable applications. With Akka, we can focus on our product features instead of worrying about managing low-level code with threads.
The framework uses minimal threads to process tasks keeping things simple. It provides a reliable, high performant & fault tolerant behaviour.
There are issues that come inherently when working with distributed systems such as messages getting lost, crashing of nodes, concurrency issues etc.
Akka helps us implement a multi-threaded behaviour without the use of low-level concurrency constructs such as locks or atomics. We do not have to worry about memory visibility issues.
It helps us write better network communication code between different distributed modules.
What is Reactive Programming & Reactive Streams?
If you are hearing the word reactive for the first time here is a gist of what it is. Reactive programming means reacting to events.
When an event occurs, do something. Run a process, perform a task, send notification & stuff. And the system needs to keep doing this continually over a period of time.
The sequence of events occurring over a period of time is called an event stream or a reactive stream.
To react to a stream of events we need to keep listening to them or in other words continually monitor them.
A real-world use case of this is data ingestion, where massive amounts of data, Bigdata, is continually ingested into the backend systems.
The backend logic keeps checking for parameters in the data & when they find the parameters true, they react to the event, run a task, send a notification, trigger a webhook etc.
How Is This Different from A Regular Servlet Thread Based Model?
Traditional servlet thread request-response models are blocking in nature.
Akka Actor model is perfect for these instances. It uses minimum threads to process events. The main thread doesn’t halt for the response, rather passes the process to secondary threads & keeps processing the events received.
As you see, there is a significant design difference between the traditional servlet model & the event-based reactive model.
Spring also has a separate module called the Spring Reactor to write event-based apps.
To educate yourself on software architecture from the right resources, to master the art of designing large scale distributed systems that would scale to millions of users, to understand what tech companies are really looking for in a candidate during their system design interviews. Read my blog post on master system design for your interviews or web startup.
Result of Migrating to a Reactive Framework
After migrating to a Reactive framework, the engineering team at PayPal was able to smoothly handle billions of events per day.
CPU Utilization was cut down by 30%, they had fewer lines of code, reduced monitoring & VM footprint. Deployment time went down from 6 to 1.5 hours.
They observed a significant reduction in processing time & failure rate.
In the initial research for the development of the event-based system, the team considered Erlang, which seemed promising but since they had several services running on JVM, they picked Akka & Spray as the HTTP library.
Squbs is a suite of components standardizing the Akka services in a large scale distributed environment.
It was built with the below principles in mind:
- It had to be extremely lightweight
- Squbs API would act as an abstraction over the Akka APIs. Though for writing the Squbs API the developers should not require any separate knowledge besides the Akka concepts.
- It had to be open source right from the first line of code with hooks available for plugging in PayPal’s modules.
Here is the GitHub repo for Squbs
Recommended Read: Master System Design For Your Interviews Or Your Web Startup
Subscribe to the newsletter to stay notified of the new posts.
Well, Guys!! This is pretty much it about the architectural design & the technology stack of Instagram. If you liked the write-up, share it with your folks. Consider following 8bitmen on Twitter, Facebook, LinkedIn to stay notified of the new content published.
I am Shivang, the author of this writeup. You can read more about me here.
More On the Blog
- Distributed Systems & Scalability #1 – Heroku Client Rate Throttling
- Zero to Software/Application Architect – Learning Track
- Java Full Stack Developer – The Complete Roadmap – Part 2 – Let’s Talk
- Java Full Stack Developer – The Complete Roadmap – Part 1 – Let’s Talk
- Best Handpicked Resources To Learn Software Architecture, Distributed Systems & System Design