SHAPE – Scalable Highly Augmented Physical Environment

This screenshot is an example of a  virtual environment augmented with proxy objects created using an older system called rt-xr. However, this was always intended to be a VR precursor for an AR solution now called SHAPE – Scalable Highly Augmented Physical Environment. The difference is that the virtual objects being used to augment the virtual environment shown above (such as whiteboards, status displays, sticky notes, camera screens and other static virtual objects in this case) are used to augment real physical environments with a primary focus on scalability and local collaboration for physically present occupants. 

The vision is that people move from SHAPE-enabled physical space to SHAPE-enabled physical space while wearing an AR headset running a standard SHAPE app. As users enter a SHAPE-enabled physical space, the virtual objects that have been placed in that space are downloaded to the user’s application and located correctly in the physical space. These objects (proxy objects) can be manipulated by remote servers to a have a wide range of dynamic capability such as display of real-time data, interaction with users etc. The alternative of developing a custom application for every space is seen as a massive obstacle to widespread use of AR. The universal SHAPE app moves custom development into servers with a much more familiar programming environment, opening up proxy object development to almost anyone.

Some of the features of SHAPE are:

  • SHAPEs are designed for collaboration. Multiple AR device users, present in the same space are able to interact with virtual objects just like real objects with consistent state maintained for all users.
  • SHAPE users can be grouped so that they see different virtual objects in the same space depending on their assigned group. A simple example of this would be where virtual objects are customized for language support – the virtual object set instantiated would then depend on the language selected by a user.
  • SHAPEs are scalable because they minimize the loading on AR devices. Complex processing is performed using a local edge server or remote cloud. Each virtual object is either static (just for display) or else can be connected to a server function that drives the virtual object and also receives interaction inputs that may modify the state of the virtual object, leaving the AR device to display objects and pass interaction events rather than performing complex functions on-device. Reducing the AR device loading in this way extends battery life and reduces heat, allowing devices to be used for longer sessions.
  • There is a natural fit between SHAPE and artificial intelligence/machine learning. As virtual objects are connected to off-device server functions, they can make use of inference results or supply data for machine learning derived from user interactions while leveraging much more powerful capabilities than are practical on-device.
  • A single universal app can be used for all SHAPEs. Any virtual objects needed for a particular space are downloaded at run time from an object server. However, there would be nothing stopping the creation of a customized app that included hard-coded assets while still leveraging the rest of SHAPE – this might be useful in some applications.
  • New virtual objects can be instantiated by users of the space, configured appropriately (including connection to remote server function) and then made persistent in location by registering with the object server.

A specific goal is to be able to support large scale physical environments such as amusement parks or sports stadiums, where there may be a very large number of users distributed over a very large space. The SHAPE system is being designed to support this level of scalability while being highly responsive to interaction.

In order to turn this into reality, the SHAPE concept requires low cost, lightweight AR headsets that can be worn for extended periods of time, perform reliable spatial localization in changing outdoor environments while also providing high quality, wide angle augmentation displays. Technology isn’t there yet so initially development will use iPads as the AR devices and ARKit for localization. Using iPads for this purpose isn’t ideal ergonomically but does allow all of the required functionality to be developed. When suitable headsets do become available, SHAPE will hopefully be ready to take advantage of them.

Shape Architecture

The current architectural concept is shown in the (somewhat messy) diagram above. SHAPE functionality is divided into four regions:

  • Core. Core functions are those that may involve significant amounts of data and processing but do not have tight latency requirements. Core functions could be implemented in a remote cloud for example. CoreUniverse manages all of the spatial maps, proxy object instances, spatial anchors and server configurations for the entire system and can be replicated for redundancy and load sharing. In order to ensure eventual consistency, Apache Kafka is used to keep a permanent record of updates to the space configuration (data flowing along the red arrows), allowing easy recovery from failures along with high reliability and scalability. 
  • Proxy. The proxy region contains the servers that drive the proxy objects (i.e. the AR augmentations) in the space. There are two types of servers in this region: asset servers and function servers. Asset servers contain the assets that form the proxy object – a Unity assetbundle for example. Users go directly to the asset servers (blue arrows – only a few shown for clarity) to obtain assets to instantiate. Function servers interact with the instantiated proxy objects in real time (via EdgeAccess as described below). For example, in the case of the famous analog clock proxy object (my proxy object equivalent of the classic Utah teapot), the function server drives the hands of the clock by supplying updated angles to the sub-objects with the analog clock asset.
  • Edge. The edge functions consist of those that have to respond to users with low latency. The first point of contact for SHAPE users is EdgeAccess. During normal operation, all real-time interaction takes place over a single link to an instance of EdgeAccess. This makes management, control and status on a per user basis very easy. EdgeAccess then makes ongoing connections to EdgeSpace servers and proxy function servers. A key performance enhancement is that EdgeAccess is able to multicast data from function servers if the data has not been customized for a specific proxy object instance. Function server data that can be multicast in this way is called undirected data, function server data intended for a specific proxy object instance is called directed data. The analog clock server generates undirected data whereas a server that is interacting directly with a user (via proxy object interaction support) has to use directed data. EdgeSpace acts as a sort of local cache for CoreUniverse. Each EdgeSpace instance supports a sub-space of the entire universe. It caches the local spatial maps, object instances and anchors for the sub-space so that users located within that sub-space experience low latency updates. These updates are also forwarded to Kafka so that CoreUniverse instances will eventually correctly reflect the state of the local caches. EdgeSpace instances sync with CoreUniverse at startup and periodically during operation to ensure consistency.
  • User. In this context, users are SHAPE apps running on AR headsets. An important concept is that a standard SHAPE app can be used in any SHAPE universe. The SHAPE app establishes a single connection (black arrows) to an EdgeAccess instance. EdgeAccess provides the user app with the local spatial map to use, proxy object instances, asset server paths and spatial anchors. The user app then fetches the assets from one or more asset servers to populate its augmentation scene. In addition, the user app registers with EdgeAccess for each function server required by the proxy object instances. Edge Access is responsible for setting up any connections to function servers (green arrows – only a few shown for clarity) that aren’t already in existence.

As an example of operation, consider a set of users physically present in the same sub-space. They may be connected to SHAPE via different EdgeAccess instances but will all use the same EdgeSpace. If one user makes a change to a proxy object instance (rotates it for example), the update information will be sent to EdgeSpace (via EdgeAccess) and then broadcast to the other users in the sub-space so that the changes are reflected in their augmentation scenes in real-time. The updates are also forwarded to Kafka so that CoreUniverse instances can track every local change.

Proxy Objects

 A proxy object is a conventional Unity GameObject hierarchy that has certain specially named child nodes. By itself, there’s nothing special about the Unity asset part of a proxy object – it could be an asset included in the app or an asset downloaded from a server using Unity’s asset bundle system. Either way, these specially named nodes can be linked to external servers. In this case, the SharingServer generates an analog clock stream that animates the clock hands. The clock definition is contained in the space definition file that instantiates all the other parts of the scene.

In principle, any component of the Unity GameObject could be manipulated remotely via the proxy object API. For example, a virtual fireplace could be created where the flames are animated by constantly varying the textures being displayed. The system is still simplistic however as there are no mechanisms for controlling transitions (such as lerping between positions or fading between textures) but this could certainly be added without too much difficulty.

As an example, the analog clock (as seen in the image above) stream message looks like this:

{
    "type": "proxyobject",
    "updateList": [
        {
            "name": "PO_AnalogClock_Second",
            "orientation": {
                "x": 0,
                "y": 222,
                "z": 0
            },
            "orientationValid": true
        },
        {
            "name": "PO_AnalogClock_Minute",
            "orientation": {
                "x": 0,
                "y": 342,
                "z": 0
            },
            "orientationValid": true
        },
        {
            "name": "PO_AnalogClock_Hour",
            "orientation": {
                "x": 0,
                "y": 568,
                "z": 0
            },
            "orientationValid": true
        }
    ]
}

Here the y value encodes the relevant hand angle. The hour angle is greater than 360 degrees as the system uses a 24 hour clock but the result is the same whatever.