Principles

Decision to implement Microservice architecture in your system comes with some constraints and rules you need to obey. Sure, you may have your own vision of how it should play together, but sooner or later you will face problems that will require you to follow these fundamental principles. Let’s find out what exactly are these.

1. Independence

In the Microservices world, independent means it has its own runtime environment, data source and can be scaled without any impact for the remaining services. Deployment process is also autonomous, changing the given service and putting it to production cannot depend on any other one in the stack. Generally you should always try to make them as highly cohesive and loosely coupled as possible.

2. Decentralized Governance

Have you ever participated in the project that had a monolithic architecture? I’m sure you did. In such situation, almost every aspect of development is constrained by the programming language, libraries, linters and coding standards set by the team. When updating the codebase, you have to follow all these rules, so your pull request can be merged without introducing inconsistencies and below-standard practices.

In the world of Microservices there is no centralized and predefined set of rules you have to follow. It doesn’t mean that crews working on their Microservices are supposed to do whatever they want, no. It is about to use a right tool for the job and leaving the choice to the team what tools will be used. If there are no strict standards for the development process and technologies used, the crew can use stack, which in their opinion and by their experience will be the best solution available.

3. Availability

Each of the Microservices in the stack must be always up and running, so the entire system works without unnecessary breaks. From the end user’s perspective, his application must be available and work without interruptions 24/7. To extend a period of availability to the maximum, a Microservice instance needs to be redundant. It means there is always some other process, which serves the same purpose. It doesn’t even has to be the process running this very same program, it may be any service, which is able to handle the incoming request the same way, the original program would do.

Microservice’s instance becomes unavailable in two situations:

it needs to be redeployed
the network operation fails

The first reason is inevitable, because program needs to receive updates. The problem is, the update process involves killing and creating new instance of the application, so there is always some period, when Microservice become unavailable. The solution for the outage, is to keep at least two instances and balance requests between them until only new ones are running. This technique is called a Rolling Update.

The second reason is practically also inevitable, because as life shows, even the greatest cloud providers and tech companies may seriously broke their entire networks. It doesn’t take place too often, but when it finally happens, there is nothing we can do, our systems will be unavailable globally. Putting global outage aside, the only instance of the program may also run on the host with malfunctioning network adapter. In such case, having more instances on different hosts protects the Microservice from being unavailable, by providing process able to handle the request.

4. Scalability

What you have learned so far about the principles is that Microservices are independent, has decentralized governance and are always available. It already sounds great, but there are more traits of a thing called Microservice.

Together with the increasing amount of work performed by the system, the overall time needed for a particular task to finish will also expand. The two keys to success in scaling Microservice, are to be aware in what areas this process will escalate and how, so it can be divided into smaller, manageable pieces. The second key is a performance, which is a measurement of how efficiently the service manages its tasks.

Growth Scales

There are two general aspects of Growth Scales, one is qualitative and concerns the influence of the Microservice on the system and how it is tied to it, the other gives quantitative measurement of how much traffic this program can handle.

Qualitative Growth Scale

Microservice is a part of some business process, it doesn’t work on its own in the void. The amount of work it will do, depends on how much it is driven by other services to perform tasks. The invoice service for that matter. It will scale together with the amount of the paid orders made by customers. The email-sending service for that instance, if used only to send an invitation message, it will scale with the amount of users being registered. But what about the other events, like confirming an order or resetting a password? The more services trigger sending an email, the more scaling dependencies will email service have.

If these metrics are known by the team responsible for keeping the service, it becomes easier for them to predict what is the expected amount of messages to process. When measuring with Qualitative Growth Scales, it shows how the particular service fits the process and helps you to predict the future demand for the computing power.

Quantitative Growth Scale

As the name suggests, this scale is about the quantity, something what can be measured using some concrete values. It is closely connected with the Qualitative Growth Scale as it put some numbers on it. This is the moment when we talk about requests per second, transactions per second and latency of transactions because Qualitative attributes has a direct influence on what amount workload will be handed to the Microservice.

5. Resiliency (Fault Tolerance)

Development of the system with Microservice architecture makes one thing sure. One or more services in the stack will eventually stop working. The question is, how it will impact on the overall system usability. Will it be still running with some features disabled, or everything will crash and burn? Based on these two factors, can you tell if your system is resilient or not? To help you with the answer, let’s go through a Resilience Patterns as a proven and tested ways to prevent a total disaster.

Bulkhead

This resiliency pattern is about partitioning, thus isolating Microservices from each other. Partitions are to prevent a cascading failures, what leads to the global outage. Let’s take a data storage as an example. If Microservices in the stack use the same database server (even with separate schemas within a particular RDBMS), and the server crashes, than all programs will be affected by this failure. To avoid this catastrophy, one must use a physically separated servers, so if one of them goes down and make one service unavailable, then remaining processes can still use their respective databases. Thanks to the partitioning, we have separated a failing service, so other functionalities work without a single glitch.

Circuit Breaker

It is quite common for today’s applications, to be integrated with some external systems like Payment Agents or shipment delivery ecosystems. Everything is fine if these externalities work properly, but how your program will react to the timeout or some other failure of the current operation? Does it stop the execution and reports an error? This is in fact some kind of solution, but what if there are thousands of such processes waiting to be restarted manually?

The better way of dealing with this problem would be to implement some retry mechanism. Issues like timeouts are likely to be temporary, so the chances are that during the next call the request will be successful. But what happens, if external service is down for two hours or more? Will your system keep trying to resend requests until they are accepted and executed? Imagine that during the first hour your customers will place ten thousand orders and all these failed payments will be flooding the external server.

It’s not only “their” problem, because your system is also put to its limits by constantly trying to get the payment through. Every network connection requires CPU, network, and some memory to be performed, so expect your infrastructure to be puffed. Requests come from your side, so this is your responsibility to put limits on retrying. These limits are what makes the Circuit Breaker a resilience pattern, because it helps you to avoid a system overload.

Asynchronous Processing

Think of the use-cases defined in your system. Could you point out those, which underlying model doesn’t have to be instantly consistent? If you don’t, I will stick to the Payment process. The store fulfilling your order is obliged to provide you an invoice as a proof, that order is paid and ready to be delivered. The merchant has usually 7 days to send you the document, doesn’t it sound like a perfect candidate for asynchronous processing?

Assuming that messages between services are handled by some durable Message Queueing server, the Invoice Service will process all requests ever sent to it. Even in case of total failure, all produced messages will be there, waiting to be eventually processed by fixed Microservice worker’s instance. Eventual stockpiling (or backlog) of not consumed messages is not taken into consideration.

Parallels

To make sure there is always at least one instance of your program running, you may run multiple instances of it. If one of them become unavailable, the backup ones are there, ready to pickup incoming tasks. The problem with this approach is that when parallel processes are not used, they still generate costs. That is why together with this pattern you should also take care of autoscaling services, so the amount of running instances is adequate to the current system load.

Design For Failure

Resilience principal concerns the overall infrastructure of Microservices in your system, and it says that no single service should have an impact on availability of the remaining ones. Outage of let’s say Payment Service cannot put the entire system down, other functionalities should be fine and serving their purpose as usual. Customer can still browse products and put them into the cart, update his or her profile, or download the latest invoice. Meanwhile, the development team responsible for the service makes necessary fixes.

6. Observability

Contrary to the monolithic system, Microservices introduce many independent moving parts, what makes the infrastructure complex and hard to debug when any of the services stops working. It is also hard to identify performance issues, because there are so many connections between components, thus many places where something can go wrong.

Solution for this unpredictability is to carefully observe the infrastructure and use collected runtime data to monitor system’s health and predict Microservices’ behaviours. What are the available sources of this data?

Logs

No more, no less but the list of bad things that have happened during the particular Microservice runtime. Exception thrown due to invalid database credentials? Timeout during connection with an external API? Or maybe warning saying that filesystem storage is almost full? It all can be found in the log, all you need to do is observe it. Not all the time of course, there are tools that will let you know about important log entries.

In Microservices realm, logs become useful when they aggregated in single place, where they can be searched and sorted by the time they have occurred. It shows you a path the given request have travelled, so identifying errors and bottlenecks becomes much easier.

Metrics

Logs may tell you a lot about what your application does, but it doesn’t know much about the environment it runs in. They will notify you about the problem with lack of memory or disk space only after it happened, so you will not be informed about incoming troubles. Thanks for us, there is a place we can go and get precise information about current status of Microservice’s runtime. It is called a metrics endpoint, it may be published by the application (but it doesn’t have to), and it provides measurements of selected environment components like CPU or RAM usage, but also technology specific like requests total numbers or duration.

Reverse Proxy

The place we should start collecting metrics is the entrypoint to the entire system, the HTTP reverse proxy. This is where user’s request is received, checked for a target upstream and passed over to get processed by other services. Some important metrics of proxying component:

total amount of request handled so far:
- helps with anomalies detection
total count of HTTP error responses:
- quickly indicates possible failure
- allows configuring alarms when above threshold
time needed for a request roundtrip:
- counts from first byte passed to the upstream, stops on last byte of response
- split by the URL path and mapped to time value
- histogram of response times for the given endpoint
amount of requests per second
- indicates current throughput
- allow spotting anomalies or vicious attacks

Microservice Application

Metrics collected from Reverse Proxy service are useful, they convey a handful of information about request and response handling, but doesn’t say anything about which Microservice took the longest to complete the task. Since your Microservice applications are HTTP-based, their instances include some type of the Web server in the front, which then passes a request to the underlying program to execute. The metrics you should collect include:

CPU utilization:
- if overloaded than instance become unavailable
- alerts are sent when load is beyond threshold for given amount of time
- you can oversee increased demand for CPU and then scale the service
RAM usage:
- if fully occupied makes service unavailable
- spots memory licks and inefficiencies
time needed to handle request
- form receiving first byte of request to sending last byte of the response
- allows spotting trouble-making Microservice

Data Sources

Data containing measurements collected from Applications is very useful, but it is the databases being real bottlenecks in the entire infrastructure. Transcations, heavy and long queries or amount of executions per query are the information you should find useful when tracing slow and inefficient data management.

Tracing

The last observable element is the request itself, and the path it went to become a response and finally being returned to the client. To be traced, the request must be marked with some unique ID. Then, every Microservice it touches must log the fact of receiving it, together with data saying how long it took to be processed. Such practice provides a useful insight of the system’s behaviour and how individual services cope with input data.

Summary

We went through the most significant aspects of designing and running scalable, resilient and observable infrastructure based on Microservices architecture. At this moment they may seem unclear, because so far we have only listed traits of a solid system and environment it runs in, without telling how to achieve the desired state of being always ready. More on that topic in the next article.