When I first looked into SCIM, the RFCs and endless tutorials were pretty dry. I wanted a playground to try things out, see what’s possible, and test my integrations. So I built one: https://scim.dev
(You’ll also see that the author is right: “The SCIM specification is basically good, but has some subtle details”)
Something I wish SCIM did better was break apart group memberships from the user resource. In the realm of SCIM's schema with the ability to have write-only, read/write, and read-only properties it makes a ton of sense to have a user's group memberships read-only and available to look at easily. But sometimes populating the list of groups a member is in can be taxing depending on your user/group db (or SaaS) solution. Especially because this data is not paginated.
SCIM allows clients to ignore the group membership lists via `?excludeAttributes=groups` (or members on the group resource). But not all clients send that by default. Entra does well to only ask for a list of groups or members on resources when it's really needed in my experience.
Some enterprise customers use SCIM with tons of users. Querying for the users themselves is simple because querying users is paginated and you can constrain the results. But returning a single group with 10,000 users in a single response can be a lot. It only really contains the user's identifier and optionally their display name, but if you have to pull this data from a paginated API it'll take a while to respond. Or it could still be taxing on some databases.
It'd be nice to query `/Users/:id/groups` or `/Groups/:id/members in a paginated fashion similar to `/Users`.
Wild that they don't mention in the article that they offer SCIM, apparently at no extra cost? Most auth providers do not offer that, typically they will charge you per end customer/tenant. For example WorkOS charges $125/month for SCIM (on top of 125/month for SSO).
Stytch is also $125/month, and frontegg is similar.
I genuinely didn't mean it to be much of an ad for our company. I was just surprised how few people had written a simple explainer of reasonable quality.
Yeah thanks for that. I think it's reasoning and not cringe to add a little note like that at the end. As somebody who previously tried to buy SCIM for my startup I would have wanted to know your solution exists!
Another point: the SCIM schema can be confusing. The RFCs make it seem like you can define your schema however you like, and it provides a default schema with which it bases examples in other parts of the RFC.
In reality, most systems expect you to have the full default schema present without modifications and might complain when items are missing. Do you provide scim support without passwords (only SSO)? Okta will send a password anyway (random and unused). Does your application not differentiate between username and email? IdPs will complain if they can't set them separately. Do you not store the user's `costCenter`? IdPs will get mad and keep trying to set it because it never sticks.
Some of the time, you'll have to store SCIM attributes on your user objects which have no effect on your system at all.
The other side is making custom schema items. SCIM has you present these in the `/Schema` endpoints. But, no system (that I know of) actually looks at your schema to autopopulate items for mapping. Entra and Okta are great at letting your provide mapping from an IdP user to a SCIM user, and then you map SCIM users back to your app's users. But you typically have to follow app documentation to map things properly if it's not using the default schema entirely.
One way this comes up is that the way those C# objects serialize, there are properties that Microsoft will send you in form `"key": { "value": "xxx" }`, but which they expect that you read back to them of the form `"key": "xxx"`.
It's best to not take the SCIM RFCs too literally.
The RFC is very clear about how extensions are supposed to be registered with IANA, which is always how RFC extensions in general work. You cannot have interoperability without a central registry.
There's the RFC way and then there's the real way.
IMO, many folks want SCIM with to support only two providers: Azure AD/Entra and Okta.
I guess there's a third: a homegrown system an enterprise has that "supports SCIM". That one is always going to be weird.
So in reality those two vendors get to determine acceptable behavior for SCIM servers (the data stores that push data into SCIM clients like Tesseral).
Anyone have experience self-hosting? Maybe with https://www.npmjs.com/package/scimgateway running standalone, or proxying to an existing user store? Curious if there be dragons— do you need a few hundred users before encountering them?
I have a question: (which I admit I could probably research myself)
It sounds like is is a sort of 'change-events' pub-subscribe approach, where dependent systems keep their state updated by doing deltas against what they currently store (if I misunderstood this part, the rest here probably is moot).
If that is true, is there anything in the system that guards against if a subscriber fails to ingest an event, for whatever technical reason (bug or systems/execution failure, e.g. network or overload),
ie some sort of 'transaction' approach where a subscriber will keep receiving/being due an event, until he has reported back an "ACK I have processed that" ().
I am asking because otherwise it could lead to things like a user account staying open/active, even though an event said it should be deleted.
Some alternative mechanisms with other trade-offs could be things/events like "group X has changed, its current member list is now [...]". In that case, the mirrored group might eventually become consistent on a later update (as opposed to carrying an eternal memory left over by a missed update).
() I am vaguely aware true consistency guarantee is impossible in a distributed system, something about confused generals.
In practice, identity providers (Okta, Entra, etc.) will retry for a bit before reporting to the IDP admin that their SCIM connection to the SaaS vendor is unhealthy. From there, things get fixed ad-hoc.
Okta and Entra have different request patterns, and so have differing artifacts if the SaaS vendor's state diverges from the desired state. Okta tends to be more stable, because they usually GET-then-PUT (c.f. compare-and-set). Entra likes to PATCH, which leads to dead-reckoning artifacts.
What you're describing is an interesting and hard problem in computer science, but SCIM is not trying that hard to get it right.
My understanding of it is that it's just a bunch of well known HTTP endpoints that a server will call as things happen (eg: user is disabled).
I think the general intention is that these can/should be retried until the caller receives a successful response indicating the other system has updated its records.
(I'm not an expert in SCIM but I've played around with it, so could be wrong)
Something I found odd / unnecessary whilst building a SCIM client was that fields are supposed to be case-insensitive:
> Attribute names are case insensitive and > are often "camel-cased" (e.g., "camelCase")
Whilst it's not a huge deal to support this, to me this feels like complexity/flexibility for the sake of it - I'd prefer more rigidity and one correct way.
One thing I haven't completed for my SCIM client implementation is a decent grammar for parsing the filter parameters. Does anyone know of a comprehensive one, preferably peggy/pegjs?
When it comes to security, you are probably better off comparing things case insensitive. Like email providers also tend to do. It would be too easy to send a message to Joe@acme.org when you meant joe@acme.org otherwise, which ca be very problematic.
About filters, yeah that’s one of the bigger problems implementing SCIM. I implemented it myself but am not aware of any open source implementation. Look out for literal strings as they must be valid JSON strings which means they may use JSON escaping rules.
Agree on the emails (even just from mobile devices having a habit of putting capital letters in unintentional places), but I was more meaning the attribute keys, eg "username", "familyName" being case-insensitive. I'd be happy enough with any casing convention here, but would prefer one case sensitive one.
I suspect in practice most systems just use camelCase, but they could use TitleCase / ALL CAPS / etc which bugs me as it feels like a committee couldn't agree and decided "why not all of them".
There's a good chance there's historical context I'm missing, though I'd like to imagine any SCIM V3 might have stricter rules on that kinda of thing to reduce implementation complexity
One of the issues we had with implementing a SCIM handler was that our client's didn't use the bulk processing. This meant that when they added <insert large number> of people to our system, we'd be hit with a large amount of individual POSTs, and they literally all came together.
In hindsight we could've done many things differently, but the usage of this service is generally very spiky, so hard to scale, but wasteful to have additional servers just idling.
It matters if 1) you have long-lived sessions on the application (RP) or 2) the application (RP) does something on behalf of the users and you need to stop that.
For example, say I leave the company. My account in SSO (IdP) gets deactivated, so I can no longer use SSO to log in to the company GitLab. However, without some way to let GitLab know that I'm gone, I might still be able to access repos with my SSH keys or access tokens, and scheduled jobs in my account will still run. Deprovisioning not only lets GitLab know I won't be logging in through SSO again, but also that it should stop the scheduled jobs in my account, and block other access methods.
You're right that depending on what your application does, you might not need it at all.
With straight up fed you can autocreate users etc but as the relying party you don't know when their access has been revoked. Sometimes this doesn't matter, but... often it does. People come and go all the time in enterprise. You can more accurately show user state in your UI and (crucially) accurately adjust license seats. You can also de-activate API tokens or alternative auth methods the user might have configured to close the loop on security.
Big orgs will pay more for SCIM to avoid having to do user access reviews in the UIs of individual products they have bought. It directly translates to person-hours of busywork and admin burden of executing / tracking that work.
Deprovisioning is huge. Provisioning is also awfully nice. For example:
1. New employee joins the team.
2. SCIM creates their account in the ticket tracking system.
3. Employee's boss can create onboarding tickets and assign them to the new person before they've even logged into the system.
That's not such a big deal for small companies that don't use a gazillion services. It's huge for large companies with hundreds of vendors where you want everyone in an employee group ("engineering", "sales", "everyone") to have an account in some of those services.
there's so many parties involved in SCIM, you're basically bound to see a tax pop up in some shape or form down the line. it just takes a few c-guys asking how to extract more money
When I first looked into SCIM, the RFCs and endless tutorials were pretty dry. I wanted a playground to try things out, see what’s possible, and test my integrations. So I built one: https://scim.dev
(You’ll also see that the author is right: “The SCIM specification is basically good, but has some subtle details”)
Something I wish SCIM did better was break apart group memberships from the user resource. In the realm of SCIM's schema with the ability to have write-only, read/write, and read-only properties it makes a ton of sense to have a user's group memberships read-only and available to look at easily. But sometimes populating the list of groups a member is in can be taxing depending on your user/group db (or SaaS) solution. Especially because this data is not paginated.
SCIM allows clients to ignore the group membership lists via `?excludeAttributes=groups` (or members on the group resource). But not all clients send that by default. Entra does well to only ask for a list of groups or members on resources when it's really needed in my experience.
Some enterprise customers use SCIM with tons of users. Querying for the users themselves is simple because querying users is paginated and you can constrain the results. But returning a single group with 10,000 users in a single response can be a lot. It only really contains the user's identifier and optionally their display name, but if you have to pull this data from a paginated API it'll take a while to respond. Or it could still be taxing on some databases.
It'd be nice to query `/Users/:id/groups` or `/Groups/:id/members in a paginated fashion similar to `/Users`.
Wild that they don't mention in the article that they offer SCIM, apparently at no extra cost? Most auth providers do not offer that, typically they will charge you per end customer/tenant. For example WorkOS charges $125/month for SCIM (on top of 125/month for SSO).
Stytch is also $125/month, and frontegg is similar.
(I wrote the article)
I genuinely didn't mean it to be much of an ad for our company. I was just surprised how few people had written a simple explainer of reasonable quality.
Yeah thanks for that. I think it's reasoning and not cringe to add a little note like that at the end. As somebody who previously tried to buy SCIM for my startup I would have wanted to know your solution exists!
That’s probably because the official website, and the specs themselves, do a pretty good job already.
https://scim.cloud/
Another point: the SCIM schema can be confusing. The RFCs make it seem like you can define your schema however you like, and it provides a default schema with which it bases examples in other parts of the RFC.
In reality, most systems expect you to have the full default schema present without modifications and might complain when items are missing. Do you provide scim support without passwords (only SSO)? Okta will send a password anyway (random and unused). Does your application not differentiate between username and email? IdPs will complain if they can't set them separately. Do you not store the user's `costCenter`? IdPs will get mad and keep trying to set it because it never sticks.
Some of the time, you'll have to store SCIM attributes on your user objects which have no effect on your system at all.
The other side is making custom schema items. SCIM has you present these in the `/Schema` endpoints. But, no system (that I know of) actually looks at your schema to autopopulate items for mapping. Entra and Okta are great at letting your provide mapping from an IdP user to a SCIM user, and then you map SCIM users back to your app's users. But you typically have to follow app documentation to map things properly if it's not using the default schema entirely.
To really support Entra in particular, you must have to reference Entra's implicit spec, which is roughly documented here:
https://github.com/AzureAD/SCIMReferenceCode/tree/master/Mic...
One way this comes up is that the way those C# objects serialize, there are properties that Microsoft will send you in form `"key": { "value": "xxx" }`, but which they expect that you read back to them of the form `"key": "xxx"`.
It's best to not take the SCIM RFCs too literally.
The RFC is very clear about how extensions are supposed to be registered with IANA, which is always how RFC extensions in general work. You cannot have interoperability without a central registry.
https://datatracker.ietf.org/doc/html/rfc7643#section-10.3
There's the RFC way and then there's the real way.
IMO, many folks want SCIM with to support only two providers: Azure AD/Entra and Okta.
I guess there's a third: a homegrown system an enterprise has that "supports SCIM". That one is always going to be weird.
So in reality those two vendors get to determine acceptable behavior for SCIM servers (the data stores that push data into SCIM clients like Tesseral).
You're right. Section 10.4 does make that more clear as well for the default schemas.
Pure chance - what should get assigned to me mere hours after reading this article but a ticket regarding SCIM users not being disabled with Entra...
> It turns out that Microsoft’s default behavior sends a boolean value as a string
> You can force Microsoft to send you the proper JSON if you use a certain feature flag (aadOptscim062020), but that’s really not an obvious solution!
Boom. That was exactly my issue.
> This sort of stuff is very time-consuming and demoralizing to resolve.
I would have sunk hours into this - thanks Ned O'Leary - you made my day!
ha! I'm glad to hear
Anyone have experience self-hosting? Maybe with https://www.npmjs.com/package/scimgateway running standalone, or proxying to an existing user store? Curious if there be dragons— do you need a few hundred users before encountering them?
I have a question: (which I admit I could probably research myself)
It sounds like is is a sort of 'change-events' pub-subscribe approach, where dependent systems keep their state updated by doing deltas against what they currently store (if I misunderstood this part, the rest here probably is moot).
If that is true, is there anything in the system that guards against if a subscriber fails to ingest an event, for whatever technical reason (bug or systems/execution failure, e.g. network or overload), ie some sort of 'transaction' approach where a subscriber will keep receiving/being due an event, until he has reported back an "ACK I have processed that" ().
I am asking because otherwise it could lead to things like a user account staying open/active, even though an event said it should be deleted.
Some alternative mechanisms with other trade-offs could be things/events like "group X has changed, its current member list is now [...]". In that case, the mirrored group might eventually become consistent on a later update (as opposed to carrying an eternal memory left over by a missed update).
() I am vaguely aware true consistency guarantee is impossible in a distributed system, something about confused generals.
In practice, identity providers (Okta, Entra, etc.) will retry for a bit before reporting to the IDP admin that their SCIM connection to the SaaS vendor is unhealthy. From there, things get fixed ad-hoc.
Okta and Entra have different request patterns, and so have differing artifacts if the SaaS vendor's state diverges from the desired state. Okta tends to be more stable, because they usually GET-then-PUT (c.f. compare-and-set). Entra likes to PATCH, which leads to dead-reckoning artifacts.
What you're describing is an interesting and hard problem in computer science, but SCIM is not trying that hard to get it right.
My understanding of it is that it's just a bunch of well known HTTP endpoints that a server will call as things happen (eg: user is disabled).
I think the general intention is that these can/should be retried until the caller receives a successful response indicating the other system has updated its records.
(I'm not an expert in SCIM but I've played around with it, so could be wrong)
Something I found odd / unnecessary whilst building a SCIM client was that fields are supposed to be case-insensitive:
> Attribute names are case insensitive and > are often "camel-cased" (e.g., "camelCase")
Whilst it's not a huge deal to support this, to me this feels like complexity/flexibility for the sake of it - I'd prefer more rigidity and one correct way.
One thing I haven't completed for my SCIM client implementation is a decent grammar for parsing the filter parameters. Does anyone know of a comprehensive one, preferably peggy/pegjs?
When it comes to security, you are probably better off comparing things case insensitive. Like email providers also tend to do. It would be too easy to send a message to Joe@acme.org when you meant joe@acme.org otherwise, which ca be very problematic. About filters, yeah that’s one of the bigger problems implementing SCIM. I implemented it myself but am not aware of any open source implementation. Look out for literal strings as they must be valid JSON strings which means they may use JSON escaping rules.
Agree on the emails (even just from mobile devices having a habit of putting capital letters in unintentional places), but I was more meaning the attribute keys, eg "username", "familyName" being case-insensitive. I'd be happy enough with any casing convention here, but would prefer one case sensitive one.
I suspect in practice most systems just use camelCase, but they could use TitleCase / ALL CAPS / etc which bugs me as it feels like a committee couldn't agree and decided "why not all of them".
There's a good chance there's historical context I'm missing, though I'd like to imagine any SCIM V3 might have stricter rules on that kinda of thing to reduce implementation complexity
One of the issues we had with implementing a SCIM handler was that our client's didn't use the bulk processing. This meant that when they added <insert large number> of people to our system, we'd be hit with a large amount of individual POSTs, and they literally all came together.
In hindsight we could've done many things differently, but the usage of this service is generally very spiky, so hard to scale, but wasteful to have additional servers just idling.
I like the way IR remotes work for your aircon. Sending the full state every key press. Think it many times is a lot better then supporting PATCH.
Isn’t this also what federation is for? And managing those „relationships“ on only one domain. I struggle to see what SCIM brings additionally?
It matters if 1) you have long-lived sessions on the application (RP) or 2) the application (RP) does something on behalf of the users and you need to stop that.
For example, say I leave the company. My account in SSO (IdP) gets deactivated, so I can no longer use SSO to log in to the company GitLab. However, without some way to let GitLab know that I'm gone, I might still be able to access repos with my SSH keys or access tokens, and scheduled jobs in my account will still run. Deprovisioning not only lets GitLab know I won't be logging in through SSO again, but also that it should stop the scheduled jobs in my account, and block other access methods.
You're right that depending on what your application does, you might not need it at all.
The main benefit IMHO is deprovisioning.
With straight up fed you can autocreate users etc but as the relying party you don't know when their access has been revoked. Sometimes this doesn't matter, but... often it does. People come and go all the time in enterprise. You can more accurately show user state in your UI and (crucially) accurately adjust license seats. You can also de-activate API tokens or alternative auth methods the user might have configured to close the loop on security.
Big orgs will pay more for SCIM to avoid having to do user access reviews in the UIs of individual products they have bought. It directly translates to person-hours of busywork and admin burden of executing / tracking that work.
SCIM is much cleaner.
Deprovisioning is huge. Provisioning is also awfully nice. For example:
1. New employee joins the team.
2. SCIM creates their account in the ticket tracking system.
3. Employee's boss can create onboarding tickets and assign them to the new person before they've even logged into the system.
That's not such a big deal for small companies that don't use a gazillion services. It's huge for large companies with hundreds of vendors where you want everyone in an employee group ("engineering", "sales", "everyone") to have an account in some of those services.
I wrote an article about this for my employer[0]. There's also a nice list of use cases in the RFC[1], including (text in parens mine):
* Migration of the Identities
* Single Sign-On (SSO) Service (allowing provisioning instead of JIT)
* Provisioning of the User Accounts for a Community of Interest (with complex infra)
* Transfer of Attributes to a Relying Party's Website (so profile data)
* Change Notification (of profile data and access)
0: https://fusionauth.io/articles/identity-basics/what-is-scim
1: https://datatracker.ietf.org/doc/html/rfc7642#section-3
Nice to see an article of this type not titled "What every developer needs to know about X"
Nothing mentioned about how you can charge extra like the SSO tax??
No need. SCIM is usually offered with SAML, which is usually the highest "taxed" SSO method anyway
there's so many parties involved in SCIM, you're basically bound to see a tax pop up in some shape or form down the line. it just takes a few c-guys asking how to extract more money