Hello there!! How is it going?
Welcome to 8bitmen.com
This write-up is a comprehensive insight into Apache OpenWhisk – The open source serverless cloud platform. The article will answer all our queries about the platform such as what is it? Why use it? How do things work under the hood? What is the underlying architecture? Why prefer it over other serverless solutions offered by cloud vendors such as Google Cloud & AWS.
So, without any further ado.
Let’s get started.
1. What is Apache OpenWhisk?
Apache OpenWhisk is an open source distributed serverless platform which lets developers bind their functions, code to external events which trigger them. Since it is open source it can be deployed on-prem or on the cloud.
Alright, so I put my code as a stateless function on OpenWhisk. What does it do?
OpenWhisk takes care of the infrastructure, servers, makes sure your application is available & scales automagically with traffic surge. It uses Docker containers to efficiently manage the app instances.
The backend functions which hold the business logic are called Actions. Actions are triggered by external events.
What are External Events?
An external event can be anything ranging from an HTTP request to a data feed such as audio, video, text etc. An upload of an image from the app UI can be considered as an external event which would trigger a function, which holds the business logic, running on the backend on OpenWhisk.
The below diagram shows the OpenWhisk programming model.
OpenWhisk has a REST API based command line interface with multiple other tools & options to deploy & manage our code. I’ll get to that further down in the article but before that let’s get to the gist of what OpenWhisk is.
Since the OpenWhisk Deployment model is containerized it can be easily deployed on all the popular container frameworks such as Kubernetes, Mesos, OpenShift, Compose etc.
Which Programming Language Do I Have to Write My Code In, In Order to Run It on OpenWhisk?
OpenWhisk supports a wide array of programming languages such as NodeJS, Swift, Java, Go, Scala, Python, PHP & Ruby. So, picking a programming language for writing apps which run on OpenWhisk isn’t an issue.
In case we have any custom requirements, we can create executable Zips & run it on Docker using the Docker SDK. Other languages such as Rust, Haskell are supported on OpenWhisk using Docker.
What If I Intend to Integrate My Backend Code with Other Services, for Instance, Messaging?
With OpenWhisk, developers can easily integrate their apps with other third-party services & frameworks etc. with the help of packages.
OpenWhisk facilitates the integration of functions with Kafka queues, APIs like IBM Watson, backend schedulers, services such as push notifications for mobile, workplace messaging like Slack, code management services like Git.
Besides running code, code can be debugged in real time with the help of different development tools. It can be invoked synchronously, asynchronously or invocation can be scheduled.
2. Why OpenWhisk?
First thing, why a serverless backend solution like OpenWhisk? Why running your backend as stateless functions? Why not a regular server instance on the cloud?
If you’ve read about what serverless is? You would realize it saves developers quite some money as it only invokes code, i:e a specific function, when triggered by external events. We don’t have to keep the server instance running round the clock unnecessarily paying for unused server time.
Another reason is not having to worry about managing the server instances and everything on the cloud. A serverless solution abstracts everything from the developer. Just push your code, sit back & relax.
Well, this is just the gist. It would take an entire article to get into the detail of why & why not serverless.
Now, let’s talk about why specifically OpenWhisk? Why not the serverless solutions like Google Cloud Functions, AWS Lambda, offered by the cloud vendors.
2.1 OpenWhisk Vs Google Cloud Functions, AWS Lambda or Any Third-Party Cloud Vendor Serverless Solution
I’ve listed down a few points which make OpenWhisk different from the existing serverless solutions.
1. Open Source, Availability of Custom Features: OpenWhisk is an open source solution unlike the other serverless solutions available in the industry. The development is driven by the community.
In case OpenWhisk doesn’t contain the feature which we require for our business, we can always write it from scratch. Also, share it with the community, this is how an open source product grows pretty quick in comparison to a proprietary product. On the contrary, in case of a proprietary product, we have to rely on the organization for a custom feature.
2. On-prem, Data Security: We can deploy OpenWhisk on-prem, in case we are not Ok with our data flowing to a third-party network & want to run things on our own private network. With on-prem deployment, the data stays in our network & is not vulnerable to eavesdropping.
3. No Vendor Lock-In: Another massive advantage of using an open source solution is no vendor lock-in. Imagine a case where our tech stack is tightly locked in with a third-party vendor product & that company decides to shut down its service. We are screwed. We are left with no option other than to write everything from scratch.
3. How Does OpenWhisk Work? What Is the Underlying Architecture?
The OpenWhisk programming model is event driven. Backend functions are triggered via external events. The events can be triggered from datastores, message queues, mobile, web applications, sensors, chatbots, scheduled tasks etc. These are also known as the event sources.
The functions encapsulating the business logic are stateless & can be triggered Via OpenWhisk REST-API, a command line interface, user-created APIs or scheduled automated triggers which I have already stated above.
OpenWhisk platform is extensible & can accommodate any programming language of your choice.
Multiple actions can be clubbed together to form a longer processing pipeline called a Sequence.
Let’s have a look at the platform’s architecture.
The diagram below shows the internal architecture of OpenWhisk. It is powered by technologies such as Apache Kafka, CouchDB, Docker & Nginx. All these techs, all together constitute an event-driven serverless programming service.
Now let’s understand the internal flow of the serverless platform
User Requests Hit OpenWhisk Platform
All the requests to OpenWhisk go through Nginx which is an open source web server, it also helps with several other functionalities such as load balancing, reverse proxy, caching, media streaming etc.
The platform’s user-facing API is completely HTTP based. Even the commands run from the CLI command line interface hits the platform in form of HTTP requests.
User Requests Are Intercepted by The Controller
Once the requests hit the platform, Nginx routes them to the controller. The controller is a Scala-based implementation of the REST API. The Controller acts similar to how the controllers work in the web applications.
User Is Authenticated & Authorized by The Platform
When the request moves forward from Nginx to the controller. The verification logic kicks in. The controller verifies if the request is valid & authentic. Before the platform executes the user request, the user should be both authenticated & authorized. The requests are verified against the records stored in the subjects database in CouchDB. CouchDB is an open source NoSQL scalable, distributed database.
User Request Is Verified & The Controller Invokes the Action
Once the user request is verified. The controller It figures out the type of web request & triggers an action. In other words, the requests execute the logic held in the functions.
To action is loaded from the whisks database in CouchDB.
The Action is Invoked Via the Load Balancer Which is A Part of The Controller
Once the action is loaded from the database. The load balancer, which is a part of the controller chooses one of the executors also called the invokers. The executor picked by the load balancer runs the action.
The task of the load balancer is to keep an eye on the executors available & to select one of them to invoke the action.
Managing Heavy Load Scenarios & System Failure Contingencies with Kafka
Once the executor triggers the action, there are two possibilities. Either the system could crash which would result in the loss of invocation or it could be under a heavy computational load as a result of executing other concurrent actions.
To deal with both the possibilities, OpenWhisk leverages Kafka, a distributed, high throughput messaging system. The invocations are buffered & persisted by Kafka until the action is invoked.
Finally Invoking the Action with Docker
All the actions in OpenWhisk are invoked in the Docker containers. When an action is to be invoked, OpenWhisk spawns a Docker container, injects the action, passes the run parameters to the action & executes it.
Once the action gets executed. Docker container is destroyed. All the results & the metadata of the invocation is stored in the activations database in CouchDB.
If you are interested in the code. Here is the GitHub repo of OpenWhisk.
4. What is IBM Bluemix OpenWhisk?
IBM BlueMix OpenWhisk is the FaaS Functions as a Service programming platform of IBM based on the open source serverless project OpenWhisk.
Naturally, the IBM OpenWhisk platform would have all the features of OpenWhisk by default since it is based on it. The underlying architecture is absolutely the same. We can write the action i;e the backend functions in any programming language of our liking.
Besides the open source project features, IBM’s platform provides other services & add-ons such as the hardware, networking, software administration, load balancing, plugin & so on. Just write the code & let the IBM OpenWhisk platform handle all the management, monitoring & stuff.
The below diagram displays the serverless ecosystem of the IBM cloud.
Time for some typical use case for the OpenWhisk platform. Also, if you wish to read more about the IBM BlueMix cloud platform. Here is an in-depth article on it.
5. What Are the Typical Use Cases of OpenWhisk?
OpenWhisk can be used for a serverless backend, mobile backend, Event-based processing like data processing, event stream processing, tasks which run at a scheduled time. Backends for IoT devices.
The project is suitable for any task which is not so complex, doesn’t run round the clock, needs to scale on the fly & you are unsure on the capacity pre-allocation.
This serverless platform has the potential to save you bags of money. You don’t have to take a server instance & keep running it round the clock. Instead just run the backend function whenever required. Why pay extra for the idle server time?
6. More On the Blog
Well, Guys… This is pretty much it on the serverless platform. Do write your thoughts in the comments.
If you liked the article, do let me know in the comments.
I’ll see you in the next article.
- Distributed Systems, Scalability & System Design #1 – Heroku Client Rate Throttling
- Zero to Software/Application Architect – Learning Track
- Java Full Stack Developer – The Complete Roadmap – Part 2 – Let’s Talk
- Java Full Stack Developer – The Complete Roadmap – Part 1 – Let’s Talk
- Best Handpicked Resources To Learn Software Architecture, Distributed Systems & System Design