AWS SSO with Terraform and Secrets Manager

You can find the latest code for this project on GitHub: https://github.com/shadetree-dev/terraform-aws-sso-permission-sets-example

We'll go through a few stages in this journey to get there:

Bitch about SSO a bit and why it's such a burden to deal with
Delegate an administrator for IAM Identity Center (SSO)
Set up some AWS Secrets Manager secrets to pull dynamically from (avoid hard-coding)
Write some Terraform and apply it!

1. Bitch about SSO a bit (rightfully)¶

If there is one thing that I can tell you about AWS IAM Identity Center (SSO), it's probably that it is one of, if not THE worst, AWS service. I'll spare most of my disparaging comments, but here's a few examples why:

You don't give a user access to an account, you map an account to a user or group, then attach a "Permission Set" (functionally, think of it as a Role). Sounds simple enough, but the problem is, when you want to remove access, guess where you go? That's right, accounts! You'd have to unmap each and every account from a Permission Set, by going backward to the accountSSSS (plural; one-by-one) and remove it. That or automate the crap out of something Amazon really should have made suck less.
If that still seems logical, check this out: when you're assigning the Permission Sets in that flow, you can search them by name. In any other circumstance, you literally have to use the ID or ARN, which means I probably wouldn't need the f***ing name anyway at that point!

AWS SSO Assign Permission Set — Trying to find a Permission Set any other time

AWS SSO Find Permission Set — Trying to find a Permission Set any other time

...

WTF

So obviously there is some capability to do it somewhere, why not in all of the workflows??? This becomes especially annoying when you work for a company where you may have dozens, even hundreds of Permission Sets. It gets better.

They aren't even alphabetical by name

You get some random ID generated that, is apparently, the ordering index. So even if you have 26 permission sets, one starting with each letter of the English alphabet, you could very likely end up with an order that spans pages and goes like this:

BogusAWSPermissionSet
GoodGodThisIsTerrible
FfsIHateAWSSO
TerribleImplementation
MyGodIHateThisService
YouShouldHateThisService
ZeroDesignConsiderationBeforeLaunch
LaughableCredsManagementInAWS
QuitHarpingOnThisJordan
AlrightLastOne

You get the point on that one, but it's absolutely ludicrous that you don't even get alphabetical order on your side.

TL;DR working with AWS SSO in any regular capacity blows.

How do we make it better?¶

With Terraform! I mean, still not an amazing process, but at least we can make things a bit less painful and decoupled if we do it right.

First some setup, you should absolutely delegate an administrator account for AWS IAM Identity Center. Why? A few reasons:

Keeping anything out of the AWS Organizations management account is best practice, and why AWS patterns are often around multi-account architectures, with narrow purposes given to respective accounts.
It's a bit of a one-way door decision, because any Permission Sets created in the management account cannot be modified outside of it!

In essence you'd always have to use the management account (including having a separate provider definition for those creds) to manage permission sets created there, or delete and recreate them using the delegated admin.
(If you're not familiar with the one-way door / two-way door thing, you can check out Amazon's culture page here - maybe someone made a one-way door decision designing SSO for AWS and birthed, if not at least proved the point of it!

Once you delegate an admin account, from then on you can consider that as your IAM Identity Center (SSO) account.

2. Delegate an administrator account for IAM Identity Center¶

This is a straightforward process, and may be the only part of it that is...

Log into your AWS Organizations management account and head to the IAM Identity Center. From there navigate to Settings and select Manage on the tab in the middle of the page

Select the Register account button to proceed and then select your account from the organization:

Register Delegated Admin Account

You should be able to just hit the radio button next to the account and hit assign, then immediately get the green banner:

With this complete, we can move on to the grueling task of how we'll structure our project.

3. Set up AWS Secrets Manager secrets¶

The awful thing about a lot of Terraform shops is that just because it does let you use a pseudo-dynamic file full of variables called tfvars. This leads to risky behaviors, sloppy coding, and can mostly be avoided by using secure holding pens for your values, like Secrets Manager!

I'm going to take a few generic examples we can build out here which hopefully give you the idea of the foundations you can build on. Let's set up some key-value pairs that have:

The name we want to assign to a newly created Permission Set as the key
A comma-separated list of AWS Managed Policies that we will attach to these roles

Now we will get into that generic "persona" type setup, where we have some possible users and their AWS templated policies.

SecurityAuditors whom will get, you guessed it, SecurityAudit policy
DBAdmins whom will get DynamoDBFullAccess and AmazonRDSFullAccess
FinanceBeanCounters whom will get the Billing job role policy
NOTE: This is an edge case we will need to account for, because its ARN differs from these others
ManagersWhoWontLogInAnyway only get ReadOnly because they shouldn't be doing permissive stuff!
DevOpsAdmins get the god mode permissions 😎

Go on over and stuff these into Secrets Manager like so:

I recommend saving this to a path like /aws/org/sso/managed-policies. This gives you a prefix to work off for any custom policies you want to keep in a separate secret, for instance (more on this later).

For now we are not setting any resource policy restrictions, just letting IAM policy definitions keep people out of our secrets (and limiting access to our delegated admin account!). Just click the next, next, next until your secrets are safe with AWS.

Now, here's where the challenge comes in. Mapping users and groups to accounts is either:

Manually done, which is fine if that is your IT operations plan and you want that additional oversight (it's not that bad in most cases to make changes once you've got Permission Sets set up)
A bit of a stitching together process, where we're going to use multiple secrets, so that we can re-run Terraform to apply new policies or update members quickly across our org

We're going to implement the latter here, and following the same process, we want to make a new secret /aws/org/sso/account-mapping

Now, we've only two (2) accounts, but the idea here is:

SecurityAuditors will be able to audit both accounts
DBAdmins need only the member account
FinanceBeanCounters need only the observability account (rollup billing stuff)
ManagersWhoWontLogInAnyway professed a need for being able to at least have ReadOnly access everywhere, so we can let them have it
DevOpsAdmins need both, but at the very least they are using SSO to abuse their powers

One more set of secrets so that we can now try to implement the logic that AWS failed to; really what I'd like to do is know "which groups should get these permissions", so let's make some mapping for that under /aws/org/sso/group-mapping still using the Permission Set name as the key

Each of these corresponds to a group already created in my IAM Identity Center

Now that we have all our pieces in place, let me show you how to hack this monstrosity together!

4. Write some Terraform to use our secret plan¶

Next, we will want to write some Terraform code that makes use of these secrets and uses some locals.tf magic to let us have our fun.

So this gets kinda hairy, admittedly... If you think about it, we're kinda just making a shitty, split apart database to handle something. You could argue "Why not put entries into DynamoDB or something and look them up??"

One reason is, secrets can have resource policies and be restricted properly, but give cross-account access (kinda like KMS and S3). We aren't using that right now, though. The learning curve for non-DB people (heck even I am in that camp) is less steep, so you can have someone update only the values needed with audit history, through the UI, and not be too worried.

I'll do my best to explain the witchcraft we're about to pull to make it work, but if you can get through this I think you'll see how you can tweak it just a lil and roll out your permissions, and then hopefully have to not think about this for a long time!

Project Structure¶

If you looked at my other posts, you'll know I am a fan of modules and making your main.tf simply the driver of your stack. This project is no different, and it actually comes out quite nicely (all things considered)!

.
├── Makefile
├── README.md
├── backend.tf
├── data.tf
├── locals.tf
├── main.tf
├── modules
│   └── permission-set
│       ├── data.tf
│       ├── locals.tf
│       ├── main.tf
│       ├── variables.tf
│       └── versions.tf
├── providers.tf
├── tfplan
├── variables.tf
└── versions.tf

Witchcraft with data and locals¶

This whooole project comes down to data manipulation. What we are going to do is:

Look up relevant values that will be used across our resources in the parent module
Normalize that data, to the degree we need, to pass in a semi-consistent data type for our variables (in the sense that we will send list(string))
Look up values in the child module, from the values in the parent module sent in (dizzying, eh?), and then normalize those
Take those normalized values and grossly mashed together data points, and iterate over them for our resource creation

Let's jump in and see if we can make sense of it.

This data source is kind of stupid, honestly. It is just an ugly way (and the Terraform supported way per docs to get the static ID that represents your SSO instance (#just-aws-things)}){:target="_blank"

data "aws_ssoadmin_instances" "sso" {}

and we also get our secret values (there are three (3) but I only show the one here)

# look up our secret for managed policies
data "aws_secretsmanager_secret" "policies" {
  name = "/aws/org/sso/managed-policies"
}

# get the latest version of it, so we can get the actual data
data "aws_secretsmanager_secret_version" "policies" {
  secret_id = data.aws_secretsmanager_secret.policies.id
}

Whether or not you are versioning your secrets, you have to use the secret_version here to get the actual value. Now we turn to our locals.tf for the fun.

locals {

  sso_instance = tolist(data.aws_ssoadmin_instances.sso.arns)[0]
  idstore = tolist(data.aws_ssoadmin_instances.sso.identity_store_ids)[0]

  # get the values from our secrets and jsondecode
  policies = jsondecode(data.aws_secretsmanager_secret_version.policies.secret_string)
  accounts = jsondecode(data.aws_secretsmanager_secret_version.accounts.secret_string)
  groups   = jsondecode(data.aws_secretsmanager_secret_version.groups.secret_string)

  full_map = nonsensitive({
    for k, v in local.policies :
    k => {
      policies = split(",", v)
      accounts = split(",", local.accounts[k])
      groups   = split(",", local.groups[k])
    }
  })

  # set some standard tags we can pass to resources
  tags = {
    Purpose    = "iam:sso"
    InspiredBy = "shadetree.dev"
  }
}

Let's go bit by bit, because it's kinda BS honestly that we have to do this!

sso_instance and idstore are just static values, but it gets returned in a list, and so we get the referenced value in locals so that we don't eff up the [0] somewhere or something. This is just good practice for that reason
all our jsondecode stuff simply lets us get the key/value data from our secret, so we need that to work with it for each of them (plus it's ugly to have jsondecode in a bunch of places, let's leave it in the locals)
Now the full_map is where it gets silly. This ONLY works if you've set things up like in this blog post: you need to have the same number of keys in each of your secrets (which we set up this way, by always having our key be the Permission Set name)!
This means we could honestly use any of them for the key, but we'll just use policies because we started there anyway!
- nonsensitive just allows values to be passed / used that are from things like Secrets Manager. Terraform is cautious because, well, you could make stuff pretty insecure if you accidentally, say, uploaded your .tfstate to a public code repo!
  In our case, we have some generic strings and nothing that is unique to our org (hopefully) in things like groups; at least nothing that should be hackable or damaging in the same way full ARNs, account IDs, passwords, etc. are
- for k, v in local.policies :
  we are iterating over ever key and value from this retrieved secret
- k => { policies = split(",", v)
  here we are setting the new key of our data construct to have the value of the following set, where policies (coming from our decoded policy list remember) is now equal to our values -> this becomes the list of policies in our secret
  We use split because we just have a plain, dumb 'ol string in Secrets Manager, delimited by commas, so we can make it a list by splitting on every comma
- accounts = split(",", local.accounts[k])
  here we do the same as above, split on commas, but the reference is a little different, because we didn't start the loop on this item. That means we still have to call it by its index
- groups = split(",", local.groups[k])
  this one works exactly the same as accounts above

After that we close our loop, and we should end up with something that looks like this to pass to our child module:

{
  SecurityAuditors = {
    policies = ["SecurityAudit"]
    accounts = ["ACCCOUNT1", "ACCOUNT2"]
    groups   = ["secops"]
  },
  DBAdmins = {
    policies = ["AmazonDynamoDBFullAccess", "AmazonRDSFullAccess"]
    accounts = ["ACCOUNT1"]
    groups   = ["db_admins"]
  }
}

and so on. It is messy in the locals but comes out kinda nice in what Terraform gets out of it!

Now we can pass each of these things to our child module, which can construct ALL of them iteratively (and fairly quickly). Here's what our parent main.tf looks like.

module "permission-set" {
  source       = "./modules/permission-set"
  for_each     = local.full_map
  sso_instance = local.sso_instance
  idstore      = local.idstore
  name         = each.key
  policies     = each.value["policies"]
  accounts     = each.value["accounts"]
  groups       = each.value["groups"]
}

So then let's take the SecurityAuditors again. We are basically sending:

name     = "SecurityAuditors"
policies = ["SecurityAudit"]
accounts = ["ACCOUNT1", "ACCOUNT2"]
groups   = ["secops"]

Now let's jump into the child module to see what happens with it!

More locals tomfoolery¶

Here in the modules/permission-set code, we've got more fun. We have one (1) data source that lets us retrieve our groups by ID, which is required to assign accounts.

data "aws_identitystore_group" "groups" {
  count             = length(var.groups)
  identity_store_id = var.idstore

  alternate_identifier {
    unique_attribute {
      attribute_path = "DisplayName"
      attribute_value = var.groups[count.index]
    }
  }
}

We can dynamically look up n groups by doing this, which is cool, but the trouble you get into is that you may have a disproportionate number of accounts vs groups, and Terraform is expecting a 1:1 relationship. Here's the snippet from our child main.tf that dictates that:

resource "aws_ssoadmin_account_assignment" "accounts" {
  count              = length(local.group_assignments)
  instance_arn       = var.sso_instance
  principal_id       = local.group_assignments[count.index].group_id
  principal_type     = "GROUP"
  target_id          = local.group_assignments[count.index].account
  target_type        = "AWS_ACCOUNT"
  permission_set_arn = aws_ssoadmin_permission_set.ps.arn
}

If we did nothing and had, say, two (2) accounts and six (6) groups, you'd end up with a broken loop because on the third iteration there's no principal_id. So how do we get around that? locals!!!

There are two things we have to normalize for all this to work still:

  policies = [
    for policy in var.policies :
    policy == "Billing" ? "arn:aws:iam::aws:policy/job-function/Billing" : "arn:aws:iam::aws:policy/${policy}"
  ]

Here we are making sure our ARN for our policy is correct. Remember, we are just passing in the names of these policies. The ARN is standard for the most part, but Billing is unique. It has an additional prefix before the policy name, so we use a conditional here. In Terraform, this is the equivalent of saying:

"If the policy is 'Billing' then, my ARN is "arn:aws:iam::aws:policy/job-function/Billing" else, my ARN is "arn:aws:iam::aws:policy/${policy}" (which is our looped variable). i.e. we handle this edge case in locals.

Let's break the next part down:

  group_assignments = nonsensitive(flatten([
    for group in data.aws_identitystore_group.groups : [
      for account in var.accounts : {
        group_id = group.group_id
        account  = account
      }
    ]
  ]))

nonsensitive is again used, because we don't have anything incredibly sensitive we are passing; it just happens to be retrieved from secrets
flatten takes multiple lists, then takes the elements and stores them in a singular list. This is so that you'd get something like:
["1", "2", "3", "4", "5"] vs multiple lists ending up like [ ["1"], ["2", "3"], ["4"], ["5"]] and confusing the s*** out of Terraform (and the rest of us!)
for group in data.aws_identitystore_group.groups : [
we are again iterating over some values; we want each group to be looked at from all our groups (from our dynamic data lookup!)
for account in var.accounts : {
inner loop again to construct with both the group and the account
group_id = group.group_id
account = account
here we are creating a new list with multiple elements, that we can now call dynamically, and should be equal in length.

If we had started with something like:
groups = ["devopsadmin", "finance"]
and
accounts = ["111122223333", "222233334444", "333344445555"]

We end up with:

[
  {
    group_id = "devops_admin"
    account  = "111122223333"
  },
  {
    group_id = "devops_admin"
    account  = "222233334444"
  },
  .
  .
  .
  {
    group_id = "finance"
    account  = "333344445555"
  }
]

Now all we need to do is tell Terraform to iterate over, and then access, these elements.

Here's the final code to actually create stuff!

resource "aws_ssoadmin_permission_set" "ps" {
  name             = var.name
  description      = "Permission Set created via Terraform for ${var.name} business function"
  instance_arn     = var.sso_instance
  session_duration = var.session_duration
  tags             = var.tags
}

resource "aws_ssoadmin_managed_policy_attachment" "attach" {
  count        = length(local.policies)
  depends_on   = [aws_ssoadmin_account_assignment.accounts]
  instance_arn = var.sso_instance
  managed_policy_arn = local.policies[count.index]
  permission_set_arn = aws_ssoadmin_permission_set.ps.arn
}

resource "aws_ssoadmin_account_assignment" "accounts" {
  count        = length(local.group_assignments)
  instance_arn = var.sso_instance
  principal_id   = local.group_assignments[count.index].group_id
  principal_type = "GROUP"
  target_id   = local.group_assignments[count.index].account
  target_type = "AWS_ACCOUNT"
  permission_set_arn = aws_ssoadmin_permission_set.ps.arn
}

It all looks so small and simple without the context, doesn't it??

Test the code!¶

Now we can test it! How will we know if it works or not? Well, go add yourself to some groups, and run it, of course! In prep for this, I created and added myself to each of the groups but have not run the code. My SSO start page looks like this:

Then, if we run the good 'ol init, plan, apply (bonus points for using my Makefile to do it 🙂) and as soon as it is done applying you can refresh the UI, where you should see something like:

Conclusion¶

If you stuck it out with me, THANK YOU, and also I'm sorry because it means you're also living through the hell that is AWS SSO.

If you happen to work for a company without mature IdP solutions (or mismanaged, which is any place probably) then you have a head start on being able to operationalize this and get back to fun work. Because, let me tell you, IAM + SSO becomes torturous over a long period of time, if you don't try to automate some of it away.

Hopefully you've learnt something useful and can make use of this with a few small tweaks! Feel free to take and mangle my code to fit your needs}; this should at least give you a jumping off point :){:target="_blank"