Real time OpenPose on an iPad…with the help of remote inference and rendering

I wanted to use the front camera of an iPad to act as the input to OpenPose so that I could track pose in real time with the original idea being to leverage CoreML to run pose estimation on the device. There are a few iOS implementations of OpenPose (such as this one) but they are really designed for offline processing as they are pretty slow. I did try a different pose estimator that runs in real time on my iPad Pro but the estimation is not as good as OpenPose.

So the question was how to run iPad OpenPose in real time in some way – compromise was necessary! I do have an OpenPose SPE as part of rt-ai that runs very nicely so an obvious solution was to run rt-ai OpenPose on a server and just use the iPad as an input and output device. The nice plus of this new iOS app called iOSEdgeRemote is that it really doesn’t care what kind of remote processing is being used. Frames from the camera are sent to an rt-ai Edge Conductor connected to an OpenPose pipeline.

The rt-ai design for this test is shown above. The pipeline optionally annotates the video and returns that and the pose metadata to the iPad for display. However, the pipeline could be doing anything provided it returns some sort of video back to the iPad.

The results are show in the screen captures above. Using a GTX 1080 ti GPU, I was getting around 19fps with just body pose processing turned on and around 9fps with face pose also turned on. Latency is not noticeable with body pose estimation and even with face pose estimation turned on it is entirely usable.

Remote inference and rendering has a lot of advantages over trying to squeeze everything into the iPad and use CoreML  for inference if there is a low latency server available – 5G communications is an obvious enabler of this kind of remote inference and rendering in a wide variety of situations. Intrinsic performance of the iPad is also far less important as it is not doing anything too difficult and leaves lots of resource for other processing. The previous Unity/ARKit object detector uses a similar idea but does use more iPad resources and is not general purpose. If Unity and ARKit aren’t needed, iOSEdgeRemote with remote inference and rendering is a very powerful system.

Another nice aspect of this is that I believe that future mixed reality headset will be very lightweight devices that avoid complex processing in the headset (unlike the HoloLens for example) or require cables to an external processor (unlike the Magic Leap One for example). The headset provides cameras, SLAM of some sort, displays and radios. All other complex processing will be performed remotely and video used to drive the displays. This might be the only way to enable MR headsets that can run for 8 hours or more without a recharge and be light enough (and run cool enough) to be worn for extended periods.

Integrating Core ML with Unity on iOS

The latest iPads and iPhones have some pretty serious edge neural network capabilities that are a natural fit with ARKit and Unity. AR and Unity go together quite nicely as AR provides an excellent way of communicating back to the user the results of intelligently processing sensor data from the user, other users and static (infrastructure) sensors in a space. The screen capture above was obtained from code largely based on this repo which integrates Core ML models with Unity. In this case, Inceptionv3 was used. While it isn’t perfect, it does ably demonstrate that this can be done. Getting the plugin to work was quite straightforward – you just have to include the mlmodel file in XCode via the Files -> Add Files menu option rather than dragging the file into the project. The development cycle is pretty annoying as the plugin won’t run in the Unity Editor and compile (on my old Mac Mini) is painfully slow but I guess a decent Mac would do a better job.

This all brings up the point that there seem to be different perceptions of what the edge actually is. rt-ai can be perceived as a local aggregation and compute facility for inference-capable or conventional mobile and infrastructure devices (such as security cameras) – basically an edge compute facility supporting edge devices. A particular advantage of edge compute is that it is possible to integrate legacy devices (such as dumb cameras) into an AI-enhanced system by utilizing edge compute inference capabilities. In a sense, edge compute is a local mini-cloud, providing high capacity compute and inference a short distance in time away from sensors and actuators. This minimizes backhaul and latency, not to mention securing data in the local area rather than dispersing it in a cloud. It can also be very cost-effective when compared to the costs of running multiple cloud CPU instances 24/7.

Given the latest developments in tablets and smart phones, it is essential that rt-ai Edge be able to incorporate inference-capable devices into its stream processing networks. Inference-capable, per user devices make scaling very straightforward as capability increases in direct proportion to the number of users of an edge system. The normal rt-ai Edge deployment system can’t be used with mobile devices which requires (at the very least) framework apps to make use of AI models within the devices themselves. However, with that proviso, it is certainly possible to incorporate smart edge devices into edge networks with rt-ai Edge.