Configuration Perspectives: AWS RDS

Your guide to configuring RDS with many stakeholders
AUTHOR
Chris Reuter, Nathan Glass, and Travis McPeak
PUBLISH DATE
August 29, 2024

This is our inaugural post in our “Configuration Perspectives” series, where we will break down individual cloud services configuration parameters. We will analyze each option, identify which teams care about them, and what a best practice is (if any).

We’ll begin with AWS Relational Database Service (RDS). In this guide, we’ll dive into the key RDS configuration options, exploring how different profiles—developers, platform engineers, networking teams, and security professionals—approach these settings. By setting up RDS configuration with various stakeholders in mind, you can maximize your odds of a successful deployment without risking the ire of your peers.

Whether you’re a developer focused on application performance, a platform engineer ensuring operational stability, or a security professional safeguarding data integrity, understanding how to optimize RDS settings can make a significant difference.

This guide will help you if you want to know:

  • RDS configuration best practices
  • How to manage your RDS settings for various scenarios
  • Types of opinionated configuration for RDS

RDS is a deep, complex service, and we’re not going to cover some of its more niche parameters. If you’re trying to integrate your database with Active Directory, this blog post wishes you the best of luck.

As you follow along, you might find the AWS Terraform docs for RDS helpful.

Developers care only!

The following are RDS settings that only developers care about. Feel free to let them loose on these (probably).

Storage and Performance

allocated_storage is primarily a developer concern: they want to set a storage size that is enough to satisfy their application, relying on auto-scaling via max_allocated_storage (below) to handle increases.

storage_type is another developer-only concern, given that the difference between storage types has a relatively low-cost impact. This may be organization-dependent.

Provisioned iops is another developer-only concern, selected based on performance needs. Like with instance class, developers typically will over-provision.

Other

db_name and username are primarily a developer concern, as they are part of the application’s configuration and need to align with their naming convention. That’s it…pretty boring!

Developers don’t care (at first)!

There is one class of parameters that developers don’t care about initially, but eventually they learn why they want to care (especially after they’ve been bitten by an outage). This class revolves around troubleshooting.

Troubleshooting

Typically, cloudwatch_logs_export is an SRE team priority. It is only after an outage or performance issue that developers realize they want these logs - and this setting is typically turned on. It should generally be turned on proactively.

performance_insight and monitoring_interval are a performance monitoring tool that SRE teams usually advocate for, and developers want after they’ve had a performance incident. This costs money, so cost teams may also want to have a say here.

Many teams care!

The largest group of parameters is here. Surprise, surprise - everyone has opinions.

Connectivity and Security

publicly_accessible is primarily a security team concern, for obvious reasons: this should generally be false. Most teams probably think the same thing, unless there’s a nuanced use case.

iam_database_auth_enabled is an area where security teams are the primary stakeholders. It allows the use of IAM roles for database authentication, which security teams are typically advocating for.

manage_master_user_password is another parameter where the security team should be setting a standard, as it helps reduce the risk of password exposure.

Developers, networking teams, and security teams all care about vpc_security_group_ids. The right security group needs to be selected to allow necessary traffic to and from the database while blocking unauthorized access. Security and networking groups are usually redefining these groups.

subnet_group and db_subnet_group_name are a primary responsibility of developers, who must choose the appropriate subnet group when configuring. Platform or networking teams might be involved, especially in larger organizations where subnet groups are predefined as part of the cloud infrastructure. This should not be left to default, as inheriting the default VPC is generally not the right choice.

storage_encrypted is typically a hard yes from the security team, but developers may want it disabled if there’s non-sensitive data because there’s a performance tax.

Stability and High Availability

deletion_protection is almost always set to true, with developers, platform, and security holding similar opinions: protecting against human error is a good idea.

delete_automated_backups is wanted as a two-step deletion protection by both developers and platform teams. It acts as a two-step because it requires deletion_prevention to be off. Developers want to make sure their work isn’t accidentally deleted, and platform teams want to ensure a safety net for stability.

blue_green is highly desirable by both developers and platform teams, allowing for safer deployments - but not all database engines support it.

replicate_source_db is another multi-stakeholder parameter. It is primarily a developer concern for applications requiring high availability across multiple regions, or when there is a need to offload read traffic to replicas.

SRE or platform teams may also care, especially if they are responsible for resiliency/stability, as will cost teams.

multi_az = Developers generally have a strong opinion on using Multi-AZ, especially for production workloads. Plaatform teams also prioritize them for higher availability and for meeting uptime requirements. These are counterbalanced by cost teams, given it effectively doubles storage costs and involves additional resources.

allow_major_version_upgrade and allow_minor_version_upgrade will have developer and security team stakeholders. Major will always be set to false, as version upgrades can introduce breaking changes or cause compatibility issues. Meanwhile, minor upgrades should be set to true from the perspective of almost all teams to prevent instability and maintain a security posture.

engine and engine_version will draw opinions from everyone. Developers should pick from a list curated by security and architecture teams.

kms_key_id will be mandated by security teams to exist as it ensures that the RDS instance is encrypted. While developers will be responsible for creating and picking one, security are the primary stakeholders.

Storage and Performance

maintenance_window is typically required by SREs, while developers are responsible for having an opinion. This is very application-specific: consider a database that supports a web application. You would want this to happen during the lowest period of activity for your application, as to minimize disruption.

With instance_class, developers are interested in picking the largest option possible. The only other team that is interested in controlling this are cost/FinOps teams.

max_allocated_storage will define if the database can autoscale if it is greater than initial allocated_storage. While developers are incentivized to maximize this value, it can get expensive. If so, the cost/FinOps team will have an opinion. While storage is non-trivial, it can be 10-25% of overall cost.

Other

tags are generally used for tracking, and everyone will have an opinion except developers.

Developers usually set the backup_window based on the specific application needs, ensuring that backups occur during times of low activity to minimize any potential impact on performance. Platform teams are involved when there is a broader organizational policy, which can occur in larger organizations when individual developers lose context.

Conclusion

The vast majority of configuration decisions being made for RDS have multiple stakeholders. This is the driving force behind many of the issues with creating cloud infrastructure today: it is often a slow, manual process with a significant number of one-offs.

We hope this to be a tool for platform, security, networking, architecture, and cost teams to support developers in creating infrastructure while achieving their own goals. Resourcely is built for this exact scenario: facilitating the creation of properly configured infrastructure that meets the expectations of various groups.

Let us know if you found this guide helpful by tagging us in LinkedIn or Twitter!

Ready to get started?

Set up a time to talk to our team to get started with Resourcely.

Get in touch

More posts

View all
September 5, 2024

Announcing Five New Integrations

Support for Databricks, Oracle Cloud, IBM Cloud, VMWare vSphere, and HyperV
November 11, 2024

Configuration perspectives: AWS S3

Best practices for configuring S3 using Terraform, while taking into account input from a variety of stakeholders

Talk to a Human

See Resourcely in action and learn how it can help you secure and manage your cloud infrastructure today!