Shared Compute Resources in Workspaces

There are certain tools and apps within a workspace that run on compute resources that are shared across the entire hub. They are the Apps (like the Built-in Apps RStudio and Jupyter Notebooks), Shiny Apps, Data Table Analytics modules, and the R console.

How the setup works

A node runs on a Virtual Machine that’s always available and attached to the hub and all nodes are hosted on Azure Kubernetes (AKS). The tools and apps run in containers, and they can be defined as isolated execution environments. Kubernetes is a service that coordinates and manages those containers on Azure computing resources. The Virtual Machine that makes up the node can be set to different sizes depending on the client’s needs. The bigger the Virtual Machine, the higher the cost will be to run it. The standard size of a node is 8 cores and 16 GB RAM and is made up by a Microsoft Azure Standard_F8s Virtual Machine.

Pods on the nodes

When you start one of the tools or one of the apps, a pod is created on the node and allocated a slice of the compute resources from the Virtual Machine that hosts the node. The tool/app runs inside this pod, and a user can use the compute within the pod to execute their task. The standard guaranteed size of a pod is 0.75 core and 1 GB RAM and it can automatically grow to a maximum size of 1.5 cores and 10 GB RAM if the user needs more compute within their session, provided there are enough resources left on the node. Both the guaranteed size and the maximum size of the pod can be set per tool/app. To request a change to the sizes of the pods, please contact our Service Desk for assistance.

Node pool and limits

When another user starts a tool/app, a new pod is added to the node and compute resources are allocated to that pod. When all the resources on the node have been allocated, a new node is created, i.e. a new Virtual Machine is spun up, and any new pods that needs to be created will be created on the new node without delay. How many pods that can fit on one node before a new one is created depends on the size of the node and the size of the pods on the node. The collection of all nodes which have been spun up to run these tools/apps is known as a node pool. As standard, a total of 10 nodes can be created in the node pool. This limit is in place to make sure that the cost of running the nodes doesn’t become too high, but it can be increased if the need arises.

Even though the pods are on the same node, they are completely isolated from each other and can’t communicate with one another. Each pod only has access to the storage of the workspace where it was started, so although the pods on a node can belong to different workspaces, no data can be shared between them, keeping the workspace security intact.

The different tools

The tools behave slightly differently depending on how they are set up.

For Shiny apps: A tool from one workspace gets one pod that all users within that workspace connect to. Only one Pod is created for this app per workspace. This means that if one user starts the app, they get one pod to run that app in. If another user also starts that app within the same workspace, the second user will be connected to the same pod as the first user.

For Apps, Data Table Analytics modules, and R Console: Each user within a workspace, for each tool they start, get their own pod on the node. There can be multiple pods for the same tool generated from the same workspace if it is started by different users. If a second user starts the same tool as the first user, in the same workspace, they will run that tool in a separate pod.

Note: Apps can be configured to work in the same way as a Shiny app if required on a per app basis.

This setup allows for each user to have access to scalable compute that will reduce the need for separate Virtual Machines within a workspace which in turn will lower computing costs across the entire hub.

Example diagram:

Node-Pool.png

Updated on November 3, 2022

Was this article helpful?