Log4j was an eye opener for many here at GoSecure. Not from a technology or security perspective, we have that covered in spades; but just how quickly the GoSecure Titan team can respond and remediate a vulnerability in a dependency. We are starting to appreciate the speed CI/CD (continuous integration / continuous delivery) has brought to the normal development cycle for handling bugs and bringing new features to market.
Internally, we have been touting how great a full CI/CD pipeline is for increasing our development velocity, but no one had considered what it meant to vulnerability response times. It was an eye opener for the GoSecure Titan team, and in a good way! So, I would like to share our journey that led to this, which all culminated in a fast response to the Log4j vulnerabilities.
A Bit of a Backstory
Last January, I started a new role in the company in the engineering organization. This was a challenge for me as I have never done anything on the engineering side of a company. While I do have a bachelor’s in computer science and master’s in information security, I have always managed to stay out of the development side of the house. This all changed!
We are building a completely new platform that will shift how we do things at the company from an internal and external perspective. This shift has already started with the end-user experience and is taking form in the data storage layers which will alter how data is accessed and what data is available by each team.
Let’s Move to GitLab Too
While we did have an existing project that was the base from which to build our platform, we started the project around the same time that the decision to migrate all development over to GitLab happened.
The existing project leveraged the Atlassian suite along with some Jenkins and Artifactory thrown in. The last project was also built in a monolith mindset but using microservices, so not really optimized but it worked! But hey, anything worth doing is worth doing right. So not only were we completely changing many of the underlying technologies and philosophies that the project was based on with the move to a new platform–we were also changing the entire toolchain. With the inception of the GoSecure Titan platform, we wanted to adopt a Cloud Native approach and break apart our builds so that we could deploy only what has changed to reduce the risk associated with deployments and make the team much more agile.
This seemed like a daunting task, but all in all it wasn’t terrible. Some great scripts came out of this work such as Migrating Issues From Jira to Gitlab, Moving Epics and Features from Aha to GitLab, the use of GitLab Triage Bot and Trending GitLab Data in Prometheus. The initial transition took us a couple months to really get our feet under us and get used to the way GitLab does things. As of today, we have hit our stride, but like everything there are certainly more improvements to be found.
Every developer knows that sometimes you have to hit a date and that means working long hours. For me, the date was Tuesday after Memorial Day weekend. And the expectation was to prepare our dev environment. So, a colleague and I worked for three days to develop the full CI/CD for the project with cascading builds for downstream dependencies and full security scanning. We’re proud of the hard work that has since been optimized with templates and other tools, but it’s all still based on our Memorial Day weekend marathon!
Let’s Up Our Platform Security Game
Now that we had CI/CD running like a well-oiled machine, we decided it was time to up our security posture. The first stop was using GitLab’s built-in SAST, Dependency Scanning, Container Scanning, License Scanning, Credential Scanning, etc., for all the CI/CD pipelines through templates. While we do have a penetration testing team that does routine security scans of our systems, we thought it would be better to get this feedback to the developers as soon as possible.
Of course, like every project, we started off with some of vulns (according to GitLab Scanning, not pen testing!). With the majority of these being in older/outdated dependencies. We spent some time getting rid of all Critical and High vulnerabilities initially.
After the elimination of all Critical and High vulns we added rules to our Merge Requests that will not let a Critical or High vuln be bought in without an approval from a manager. This way the team wouldn’t just add something in and let it ride and cause us to have to revisit this code.
It lets us all sleep a bit better at night.
Observability is the Way to Go!
One lesson we learned quickly was that we needed some centralized logging and metrics. While it is now very easy to deploy the platform into Kubernetes through CI/CD and spinning up new services was cookie cutter — as our sprawl continued, logging became an issue. We needed centralized logging that allowed us to quickly and easily isolate errors and areas of improvement.
I was off to find a solution that would meet our needs. We looked at a few log-only solutions that were interesting, then we found Grafana Loki. Loki is part of a larger ecosystem that includes Grafana, Loki, Prometheus and Tempo. This has been a lifesaver. Who would have thought pulling metrics, logs and traces into a single platform would be so satisfying? But Grafana has made it so. The icing on the cake was being able to alert the team of errors, OOM or any other random pattern we wanted, through Grafana. Now, we have alerts in the tools we currently use.
This all went so well, that we have decided to redo our entire ingestion platform to leverage Loki. More on that topic later.
And Then Log4j…
Pretty sure everyone reading this has now heard of Log4j and while many might not know what it is or what it does, you have at least heard of the associated Log4Shell vulnerability that has been widely impacting systems and applications that use Java.
When the Log4j fun started, we took assessment of where Log4j was and wasn’t used. We are refactoring our java micro-services to Spring Boot but like all tech debt initiatives, it never happens as fast as you would like. Of course, our new microservices weren’t vulnerable but what about all those ones that we haven’t converted? We found that the 90% of projects that haven’t been converted, of course they used Log4j. Due to our vulnerability scanning in CI/CD, our projects had all been updated to 2.14–1 – the latest at the time – but it was now time to update to 2.15 on the Monday after the Log4j vulnerability craze happened.
We managed to upgrade every effected library and service in roughly 6 hours. The reason it took us this long is we did a quick assessment that we had put Log4j declarations everywhere to fix the last vulnerability. Remember that security scanning we talked about before, yep there’s the reason! During this rebuild, we decided to pull that out of the individual projects and put it in a central library that we have so that next time upgrades will be quicker. Little did we know that this would be for version 2.16, 2 days later.
The 2.16 upgrade took us about 4 hours to rebuild the entire application. During this deployment, we also added two new services. Then, when we had to upgrade to 2.17 a few days later, we were able to get it all done in just over an hour. Of course, these times consider all of the build and deploy times, not the time to modify the code.
So Yes, DevOps is Pretty Cool!
If Log4j had happened 9 months ago, there would have been no way that we could have responded so fast. Granted, our exposure was so low to the issues found in versions 2.15 and 2.16, we knew it was faster to just rebuild and deploy the entire platform than to really worry about the corner cases we would have to test for.
In fact, in GoSecure Titan we did two complete rebuilds before other teams even determined if they were in fact vulnerable or not. The best part is, we are still scratching the surface on our velocity with CI/CD, I can’t wait to see what this year brings! And by the way, I plan to be a new, somewhat regular contributor to this blog, so for my updates and to hear the latest from my security colleagues, be sure to follow us on Twitter and LinkedIn.