When it comes to cloud computing, the sky is the limit

Cloud computing has changed the way businesses operate, allowing them to store and access data on a global scale. Organizations have traded the pain of managing hardware and on-premises solutions to host their solutions on cloud providers such as Amazon Web Services (AWS), Google Cloud Provider (GCP), and Microsoft Azure Cloud (Azure). These vendors are not only reliable, but offer a variety of managed solutions, reducing what can be months of engineering work. That said, to remain competitive, businesses need to understand the challenges they may face as well as how to maximize the value of their cloud-based solutions. This is especially true as new features and tools are released annually.

Being successful in the cloud can involve a lot of moving parts, but I think it can be broken down into certain key topics:

  • Choose services: Choosing services to solve your company’s needs and choosing managed services versus open source software
  • Handling scale: Design your cloud-based architecture to be optimized and scale to the needs of your business goals
  • Observability: How do I not only monitor or collect logs of my system, but how do I understand why something went wrong?
  • Safety: How do I ensure that sensitive data and internal tools and services are not compromised?
  • Developer Efficiency: How do I automate and maintain the integrity of deployments to the system so my engineers can focus on building new features, rather than focusing on updating production?

Note: Since AWS is the most popular cloud platform, our examples will mainly be with it. But the same logical strategy applies to any cloud platform

Choose services

AWS and Azure have over 200 services, while GCP has well over 100. How do you start choosing? Well, there are several ways one can choose a range of services to deliver the same thing, but the final selection should be based on business needs. For example, you might want to start a single page application written in React. One method, using AWS S3 and CloudFront, will ensure that your website is delivered quickly to everyone around the world. But let’s say your application gets more and more complicated user interface features, making time a pain for users, especially if their computers are slow. In that case, a better solution might be to host a React service with server-side rendering on an AWS ECS cluster.

In addition, it may be necessary to know whether to use a cloud provider’s managed service versus an open source solution that you host on the cloud yourself. For example, using AWS API Gateway with Lambda contrasts with ECS hosting and API powered by a popular framework like Flask. Both solve the same problems, but come with different costs and benefits. For example, Lambda may be lightweight and cost-effective, but one must now understand limitations, such as runtime and memory, as well as other nuances such as cold-starts and RDS proxies to reliably connect to databases.

That everything can appear as very dense information, but it is important to remember that new services are added to the cloud every year, so you have to keep up to date with the latest information. Doing so can help organizations adopt newer services that introduce better cost savings and better reliability.

Handling weight

Your organization is growing. There is a big difference between the 100 daily users you may have right now and the 100,000 you will have in the future. Similarly, there is a difference between transforming and moving megabytes of data versus moving it when it grows to gigabytes. You have to choose tools that not only solve today’s needs, but also tomorrow’s needs. The key is to design your cloud architecture so that performance scales with your needs along with cost. Leveraging services and features such as load balancing and auto-scaling are popular ways to deal with this.

Observability

Setting up log collection and monitoring are often good sources of information to help you know when something is going wrong. But this only gets you so far. As things get bigger and more critical, you want to know not only if your system is at fault, but also why it might fail. This is called observability. Monitoring is reactive, while observability is proactive. This is particularly important since in a large cloud architecture you have several resources running and connected to each other. Popular tools like DataDog and Dynatrace, and even Cloud Native like AWS X-Ray, can help trace the origin and root causes of errors. Another important key here is best practices, such as having Slack and email alerts setup so you know immediately when a key service might fail, and ideally why. Adopting observability as a practice will help your organization resolve errors quickly.

Safety

There may be important data and services that you do not want to be public. This means you need to ensure that sensitive resources are not open on the public internet by restricting access policies, isolating them in private cloud networks, setting up IP whitelists and blacklists, or even adding Oauth and SSO authentication layers. Additionally, you’ll want to make sure you scan all open source dependencies for vulnerabilities before deploying them in production. A great place to put these scans is in the CI/CD pipelines. How you choose to protect things will again depend on the services you choose to use and your company’s needs.

Developer efficiency

The last part of your strategy is to set up your system in a way that makes it easier for engineers to focus on building new features instead of worrying about how to implement them. There are integral concepts in the DevOps worlds that will help you do this. The first is infrastructure as code. This will make it easier for you to spin up new cloud resources and maintain multiple environments such as development, staging and production. The second is continuous integration and deployment. This enables engineers to have the deployment of new features automated as soon as they are developed. Apart from these, another key aspect is choosing the right cloud technologies that do not add friction to the development process.

We often see that organizations choose services in a way that causes too much friction in the development process. A common one is to use AWS Lambda over something like a simple cron job running in an EC2. Sure, one can be cheaper or easier on initial setup, but there are so many other challenges you can face as we covered earlier, and if it’s set up wrong, it will be difficult/time consuming for engineers to test it. This ends up meaning that while some services may “cost less”, you will have to pay more in time and engineering hours, making things more expensive in the long run.

All these challenges should not scare you, but inspire you. Cloud computing is an important part of modern business operations. It continues to evolve, with new technologies such as artificial intelligence, machine learning and blockchain empowering businesses to do more. Organizations must tackle the challenges of implementing cloud applications and improving cloud performance, and when they do it right, they will reap the benefits of this technology.

About the author

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *