Platform Architecture – Building the back ends and systems that support AS services. State? Scale? Price? Persistence?

From IIW

Platform Architecture: Building The Back Ends & Systems That Support AS Servers. State? Scale? Price? Persistence?

Thursday 11G

Convener: Adam Hampton

Notes-taker(s): Neil Thomson, Query Vision

Tags for the session - technology discussed/ideas considered:

Performance, User vs. app/service Authentication, Scalability

Discussion notes, key understandings, outstanding questions, observations, and, if appropriate to this discussion: action items, next steps:

The application context is a Multi-Tenant, SAS, Cloud application w Open ID Connect, with RP’s, AS, Resource (API) Servers, utilizing Hardware Load Balancing, Service run-time data caching and persistence where multiple instances of the same service component exist

Also assumed was a Continuous Development/Deployment cycle were new software instances are hot swapped, with Load Balancing doing the swap out by redirecting all traffic away from the instance to be updated allowing all requests to complete, swapping in the updated instance, waiting for it to indicate to the LB that it’s ready, then doing the same with other instances.

Issue with swap out is the preservation of state and related data which typically will be done with RAM cache on a separate server, with disk backing with replication across instances of the same service. Replication, sync … of the state cache is the responsibility of the cache service. Examples including MemCache, Redis

Scalability notes

  • For JS clients, including SPA’s, which can be treated as static pages, solutions like CloudFlare can significantly improve performance
  • Scalability model is more boxes (Horizontal) vs. bigger boxes for app, cache and non-relational db components. If Relational is used for the persistence model, is mostly Vertical (bigger boxes) with up to 2 instances (or 2 pairs if primary/hot standby structure)

Between the RP, AS, Caching and persistence servers, in 2019, all communication is via HTTPS or (where possible) Mutual TLS. The idea of terminating HTTPS at the RP (tip of the “stack”) is no longer sufficient.

The stack of service, cache should be pairwise for each instance.

Each service should be responsible for their own persistence which should be independent (and can be independent technologies)

An example of scalability via caching (see green on the diagram) is where a (reference) token is validated with the AS, which returns the actual access_token in JWT form or the JWT contents by the token_inspection_service , which should be cached by the service, with an expiry that matches that of the access token.

An unresolved issue is the “big red button” revocation of a user’s sessions (or multiple user’s session) in the case of lost/stolen device or credential compromise. OpenID Connect does not currently define a full featured workflow, capabilities, etc. for the Revoke scenario.

It was noted that it is not sufficient to remove the OIDC tokens, the application must immediately purge all data, data presentation or transaction – all of which may be beyond the OpenID Connect workflow and control (app specific behavior).

Possible CAEP has a mechanism (or should develop a mechanism) to cover this scenario.

Different persistent store technologies were reviewed, including SQL and no SQL, including Key/Value, Document DB (Mongo DB) and Graph DB.

Key requirements for technology selection and scalability:

  • The “query” profile – what type of queries (extract) and create/update/delete
  • The volume of each (over time, including peak times)
  • Read/Write balance (e.g. mostly read or equal read/write)

Cautionary tale of not knowing the impact of operations on performance/scalability – Kafka Topics have a very high cost to create and delete, which was not known until after implementation, which had consequences

Authentication “Reach” - User authentication should only be used as deep in the stack as required where the user related privileges apply. Past that point service/machine accounts should be used (service to service).

<Photo of Whiteboard Below>

IIW29 TH 11G Platform Architecture-Building The Back Ends & Systems That Support AS Servers(1).jpg