Your CSPM remediation playbook

Crafting a strategy for fixing cloud misconfiguration
AUTHOR
Chris Reuter
PUBLISH DATE
April 24, 2025

So you bought and implemented a CSPM (Wiz, Orca, Cortex Cloud, CloudDefender, etc.) - congratulations!

After just a few hours, you can set up these tools (they’re agentless!) and start getting results. Now you have visibility into the vulnerabilities and misconfiguration across your entire fleet. You get dashboards and dashboards, lists and lists.

Now what?

Unfortunately, visibility is not the same as protection. Out of all the Resourcely customers I’ve talked to, many report they are overwhelmed by the noise that CSPMs generate.

  • Thousands of vulnerabilities & misconfigurations
  • The same vulnerabilities resurface, day after day

Whereas once the work of security teams was focused on identification, now that work has shifted to remediation.

“Fix the problem” calls out the CISO standing on the bow of the Good Ship Security, feverishly looking to the West. “At no cost, we must remediate these findings!”

And remediation is a frightening world full of manual work.

Below are some best practices and suggestions for making your remediation project successful. At Resourcely we’ve created the best tool for streamlining remediation, check it out here if you want to make this process even easier.

Crafting a CSPM remediation strategy

The remediation process is a classical human problem. In the world of DevSecOps, we are obliged to include developers in the remediation process. What should that process include?

I propose:

  • Prioritization
  • Coordinating with stakeholders
    • Gathering context/updates
    • Planning and timing
  • Remediation creation
  • Rollout
  • Rollback
  • Monitoring and reporting

Prioritization

In this stage, security teams export findings from CSPMs or Cloud APIs in order to manipulate it and identify which resources should be fixed. Most of the time, this exercise requires input from stakeholders. Some relevant information that you may prioritize on are data sensitivity, region, application supported, etc.

Tools used: Excel, Sheets, Dropbox, Box

Fix by Resourcely connects directly to CSPMs, importing misconfigurations and allowing for automated ingestion of context

Coordinating with stakeholders

Gathering context/updates

When is a misconfiguration, truly “mis”?

The next step in the remediation process is to collect information from developers and other stakeholders: gathering information about these resources will help guide your prioritization process, while delivering information will help you plan and execute remediation in the future.

Consider a VM that supports a production application, with connecting processes that utilize IMDSv1 to access the VM. In this example, you would want a way to exempt this “misconfiguration” from being remediated.

Typically this is done the old fashioned way: sending emails and Slacks, and tracking information in spreadsheets.

Planning and timing

A subset of the coordination phase consists of planning a remediation. In many cases, fixing a misconfiguration needs to be carefully timed. You would not want to roll out a risky configuration change during prime time for a particular application.

Managing this rollout timing is particular sticky, requiring significant back-and-forth communication and tracking for documentation purposes.

Gathering and communicating this information can be one of the most painful parts of the entire remediation process.

Tools used: Slack, email, Jira, Google Forms

🖱️Fix by Resourcely automatically generates forms for collecting input of any kind, along with workflows for exception approval and timing coordination

Remediation creation

Once security teams and stakeholders have coordinated to identify which resources should be remediated, it is time to create the actual remediation.

Many security teams today will ask developers to manage this process. For infrastructure as code shops, that can mean developers writing custom Terraform that they aren’t necessarily familiar with. Other times, it can mean writing scripts to hit Cloud APIs at scale or (worse) making changes in your cloud console.

This can also require communication between stakeholders and security teams, as well as modeling and testing scenarios.

Tools used: IDEs, documentation

🖱️Fix by Resourcely automatically creates fixes based on best practices for common security misconfigurations

Rollout

Congratulations, you’ve made it to the rollout stage. Are you excited? Rolling out your remediation is where the rubber meets the road.

With manual remediation projects, processes differ here: advanced security teams may hand off patch branches to Ops teams or request review from individual developers. In other cases, remediation rollout may end up as a distributed process with complex change windows, multiple PRs, or even with console-based changes.

This is really the culmination of all of your coordination work from before. Hopefully it goes well!

Tools used: version control, CI/CD, cloud consoles

🖱️Fix by Resourcely can push changes for security teams who want to write code, and can give them insight into breaking vs non-breaking changes before they are made. This allows security teams to independently make changes that aren’t impactful to production (i.e. adding tags, creating backups, adding logging).

Rollback

So you’ve rolled out your patches. Turns out, misconfiguration fixes can often cause surprise failures in production. Even though you’ve collaborated and coordinated your heart out, there’s a problem.

Consider a scenario where your security team is restricting permissions for some unused IAM users. 2 weeks later, an IAM user comes to life and tries access a database it no longer has access to. This was an edge case in a custom-facing application!

This speaks to two important capabilities you will want to think about:

  • The technical capability to track state and rollback quickly
  • The strategic capability to design remediations that CAN be rolled back quickly

Tools used: CI/CD, CSPM scans, and other custom tooling

🖱️Fix by Resourcely automates scans at the resource level for remediated resources, so you can get instant insights into how your rollout is going, and instantly rollback your fix.

Measuring and reporting on success

Finally, for any remediation project (large or small), you will want to track the state of your rollout and report on that to others.

First step is to automate some reports that keep track of the status of your rollout, and the resources that are impacted. This can include re-running CSPM scans, and possibly integrating application logs and performance metrics into something like a Grafana dashboard.

The second step is to report your results to stakeholders. You’ll want a single place that your boss, developers, product managers, and your peers can go to identify how many resources were subject to remediation, how many are done, how many were excluded, etc.

it is important to note that auditors and compliance teams will be very interested in remediation status, including any decisions that were made to exclude resources from being remediated.

Tools used: Logging tools, CSPM scanning, dashboarding, documentation tools (Confluence, etc)

🖱️Fix by Resourcely has integrated reporting at the project and overall level, for easy tracking of success metrics.
Done!

Congratulations, your remediation battle is over…but the war is just beginning.

Measuring remediation as a function

CISOs, compliance executives, platform teams, and any leader adjacent to cloud security should care deeply about a robust, effective remediation function. While the process above is a snapshot of an individual remediation project, consider how you can measure the effectiveness of your security team (whose job is now primarily a remediation-focused one):

Mean Time to Resolution

The gold standard of fixing problems. We’ve all seen MTTR: this is how long it takes on average for you to fix the broken thing from the time you find it. If this number is going down, you’re doing a good job.

Vulnerable resources / total resources

Another good one to make go down. It is more forward-looking, and will require more coordination with your peers (those pesky developers.

Repeat Findings

How many of your findings are happening more than once. This will give your team a good idea of where to focus on preventative measures.

Common use cases

If you’re looking for a place to start, consider some of these projects that are impactful:

  • Adding tags to resources
  • Automating logging on key resources
  • Setting up backups for databases/buckets/block storage
  • Removing public access
  • IAM permission cleanup

Conclusion

In the new world of agentless CSPMs, scanning and identifying issues is now a commodity. The real work of security teams now lies in fixing what is broken, and preventing misconfiguration in the first place. May the sun shine on your face, as you row relentlessly towards the shores of 0 MTTR and 0 vulnerabilities.

Ready to get started?

Set up a time to talk to our team to get started with Resourcely.

Get in touch

More posts

View all
November 20, 2024

Making AWS ControlTower Account Factory easier with Resourcely

Turning the Account Factory for Terraform modules into a smart UI
November 22, 2024

The DevOps Tax on Central Teams: Livestream

Diving in to how Netflix tackled DevOps challenges

Your CSPM can't fix cloud infrastructure

Learn how Resourcely can improve your cloud posture in days, not quarters