DevOps: What's a CI/CD pipeline?

DevOps: What's a CI/CD pipeline?
📚 This is a series of articles discussing about DevOps. If you think an article is missing, you can always suggest it.

When I started working at Valiu, there wasn't a proper lifecycle for software development defined, less a handbook for DevOps culture. Therefore, we had a challenge, of transforming ourselves from an exceptional group of hacker-rooted engineers into a highly functional and effective team of developers.

Contrary to the common belief, DevOps is more than a single role in a company. Is not the team of guys who maintain the servers holding a coffee cup the entire day. Is not just a developer who learned Linux too much.

Instead, it's a human-centered cultural approach, that should be implement within the context of digital transforming an organisation, whether it is tech-based or not. Actually, every tech company —even the big ones such as Microsoft or Amazon, for historical reasons— lacked in their time from defining DevOps practices.

The difference between the success of these organisations, compared to those who have troubles scaling, independently on the quality of the products', apart from a user-centered design approach, it's the adoption of a culture of automated, streamlined practices of which impact can be measured, otherwise known as DevOps.

More technically, it's a series of software engineering practices. Those practices can be divided into several stages, that play a key part in the lifecycle of a software-based product or service.

One of those stages is, of course, CI/CD.

The first: Why CI/CD?

Let's talk about a common scenario in the life of a tech company before CI/CD.

You are the lead developer / CTO / "dude, you gotta make this work" of a startup you just started with a couple of fellows.

You, as a good developer, write up some tests to check the functionality of the systems you're writing work as expected.

The MVP is ready, and right after you finish the code that's ready for the private beta, you upload the changes in the server —maybe using git—. Everything works. You keep working on the changes by yourself, and upload once or twice a week.

You've grown enough to have another developer in the company. One day, you assign them into modifying critical module, let's say auth, for instance.

The developer works all day long, and at the end of day, they're tired enough not to check their modifications led to a breaking change in the way opened sessions are stored.

Normally, after running some tests, you would notice several changes to include the breaking changes into the tables would be needed.

However, it's a couple of hours past midnight and it's normal to have an urge for going to sleep. Our developer might think: "well, I finished this shit. It works. It's ready. Commit… push". It's normal. We're human, after all.

The next day, a critical update is needed. A presentation for investors will be held at 2:00pm. Then, the release for alpha testers. You check your changes. Run tests. Everything works fine. Commit. You pull changes from the remote repo. No conflicts. You push your changes.

Before lunch, you merge to master. That should be upgrading the production version in a really small maintenance window of 30 seconds you are proud you built entirely from scratch in 9 hours using git and bash.

1:00pm. Team lunch. Everyone's excited for the new features that you are releasing today. And then it happens.

You start getting LOTS OF EMAILS on your inbox. People just can't log into the platform.

Immediately, you run into the office, and head to your computer. There's no way to rollback, unless you put the entire platform down, affecting both the users that are trying to log in –whatever—, but also the users that are already working.

Fuck.

There's not enough time. You announce a maintenance window of —oh, my god— 5 minutes! You rollback. It's done.

Now, to the post-mortem. What happened? You check the logs. Oh, no! You don't have logs.

You test out locally. You run the tests.

Oh, fuck!

There is an insertion error for the session table. The model method that allows adding these changes failed for several cases using social networks.

You manage to apply the needed changes. Test locally. It works. Push changes again. It works on production. The day is saved.

But it's too late. It's 2:45pm. The investors meeting to showcase the new features had to be moved to the next week.

Now it's the time for the what ifs.

  1. What if we had something that warned me about a failure. I mean, coders are error-prone. And yeah, tests were written! But they didn't stopped the failing code from being pushed into production.
  2. What if we had a procedure for rollbacking code? Those were 5 minutes when literally nobody could use our product.

The answers to those questions reside in the CI/CD practices. Now, let's get into them.

The second: Defining CI/CD

CI/CD are acronyms for Continous Integration and Continous Deployment; development practices that are the beginning of any DevOps initiative, at least where the code is involved.

They are essentially, well… code, in charge of ensuring the software we release works anywhere as well as "it works on my computer".

Continuous Integration

It's the process of continuously (hence the name) integrating your codebase as changes are added, in order to make sure failures are detected the earliest possible.

Integrating implies building and testing everything. To be really continuous, it needs to occur multiple times. But, how much?

According to Joel Spolsky's 12 Steps to Better Code (a.k.a. The Joel Test), at least a daily build is needed. But that's a 2000's article. Today, we know much means as many builds as needed. More exactly, one per push. And that's what CI aims to.

Basically, when you push changes, a script that builds your product and run tests should be run. This is the minimum. There are additional steps that are designed to ensure the code quality (Code Linting, Code Coverage, SAST, etc.)

Continuous Deployment

It's the process of continuously (I think you already got it) deploying the changes into the according environments on your development process pipeline.

Deploying implies building the software, packaging and sending it to the corresponding servers. Additionaly, we can add steps for stopping and rollbacking environments.

An environment is a set of components (databases, servers, domain names) that point to a state of your code, and are intended for a specific audience. Examples are development, intended for… uh, developers?; but also staging or qa, intended for those alpha testers. And of course, production, that is the environment where your users would be working on a daily basis.

The more environments you have, the better. But that's expensive, because that means spending money on hosting these. Have at least two (dev and prod). Ideally three (dev, qa, and prod).

Wrapping up

CI/CD are the first steps, because they help mitigating risks associated with changing code through time.

Now that you know what risks can be mitigated through implementing CI/CD practices, we can start planning them out before your next release.


In the next article, I'm explaining how to implement a full CI/CD pipeline using GitLab CI.