Linking AR augmentations to physical space using the ArUco marker system

Following on from the earlier work with ArUco markers, rt-ispace can now associate ArUco markers with augmentations in a space. The image above shows two glTF sample models attached to two different ArUco marker codes (23 and 24 in this case). Since these models are animated, a video also seems appropriate!

The image and video were obtained using an iPad Pro running the rt-ispace app that forms the front end for the rt-ispace system. A new server, EdgeAnchor, receives the AR video stream from the iPad via the assigned EdgeAccess, detecting any ArUco markers that may be in view. The video stream also contains the iPad camera instrinsics and AR camera pose, which allows EdgeAnchor to determine a physical pose in space of the marker relative to the camera view. The marker detection results are sent back to the iPad app (via EdgeAccess) which then matches the ArUco IDs to instantiated augmentations and calculates the world space pose for the augmentation. There are some messy calculations in there but it actually works very well.

The examples shown are set up to instantiate the augmentation based on a horizontal marker. However, the augmentation configuration allows for a 6-dof offset to the marker. This means that markers can be hung on walls with augmentations either on the walls or in front of the walls, for example.

A single EdgeAnchor instance can be shared among many rt-ispace users as no state is retained between frames allowing the system to scale very nicely. Also, there is nothing specific to ArUco markers: in principle, EdgeAnchor could support multiple marker types, providing great flexibility. The only requirement is that the marker detection results in a 6-dof pose relative to the camera.

Previously, I had been resistant to the use of markers, preferring to use the spatial mapping capabilities of the user device to provide spatial lock and location of augmentations. However, there are many limitations to these systems, especially where there is very limited visual texture or depth changes to act as a natural anchor. Adding physical anchors means that augmentations can be reliably placed in very featureless spaces which is a big plus in terms of creating a pleasant user experience.

Real time OpenPose on an iPad…with the help of remote inference and rendering

I wanted to use the front camera of an iPad to act as the input to OpenPose so that I could track pose in real time with the original idea being to leverage CoreML to run pose estimation on the device. There are a few iOS implementations of OpenPose (such as this one) but they are really designed for offline processing as they are pretty slow. I did try a different pose estimator that runs in real time on my iPad Pro but the estimation is not as good as OpenPose.

So the question was how to run iPad OpenPose in real time in some way – compromise was necessary! I do have an OpenPose SPE as part of rt-ai that runs very nicely so an obvious solution was to run rt-ai OpenPose on a server and just use the iPad as an input and output device. The nice plus of this new iOS app called iOSEdgeRemote is that it really doesn’t care what kind of remote processing is being used. Frames from the camera are sent to an rt-ai Edge Conductor connected to an OpenPose pipeline.

The rt-ai design for this test is shown above. The pipeline optionally annotates the video and returns that and the pose metadata to the iPad for display. However, the pipeline could be doing anything provided it returns some sort of video back to the iPad.

The results are show in the screen captures above. Using a GTX 1080 ti GPU, I was getting around 19fps with just body pose processing turned on and around 9fps with face pose also turned on. Latency is not noticeable with body pose estimation and even with face pose estimation turned on it is entirely usable.

Remote inference and rendering has a lot of advantages over trying to squeeze everything into the iPad and use CoreML  for inference if there is a low latency server available – 5G communications is an obvious enabler of this kind of remote inference and rendering in a wide variety of situations. Intrinsic performance of the iPad is also far less important as it is not doing anything too difficult and leaves lots of resource for other processing. The previous Unity/ARKit object detector uses a similar idea but does use more iPad resources and is not general purpose. If Unity and ARKit aren’t needed, iOSEdgeRemote with remote inference and rendering is a very powerful system.

Another nice aspect of this is that I believe that future mixed reality headset will be very lightweight devices that avoid complex processing in the headset (unlike the HoloLens for example) or require cables to an external processor (unlike the Magic Leap One for example). The headset provides cameras, SLAM of some sort, displays and radios. All other complex processing will be performed remotely and video used to drive the displays. This might be the only way to enable MR headsets that can run for 8 hours or more without a recharge and be light enough (and run cool enough) to be worn for extended periods.