Adding ArUco marker detection to rt-ai

There are many situations where it is necessary to establish the spatial relationship between a camera in a space and 3D points within the same space. One particular application of interest is the ability to use markers to accurately locate holograms in a space so that AR headset users see the holograms locked in the space, even as they look or move around the space. OpenCV has the ArUco marker detection included so that seemed like a good place to start. The screen capture above shows the rt-ai Aruco marker detector identifying the pose of a few example markers.

This is the simple rt-ai test design with the new ArUcoDetect stream processing element (SPE). The UVC camera was running at 1920 x 1080, 30 fps, and the ArUco SPE had no trouble keeping up with this.

This screen capture is a demonstration of the kind of thing that might be useful to do in an AR application. The relative pose of the marker has been detected, allowing the marker to be replaced by an associated hologram by a 3D application.

While the detection is quite stable, the ArUco SPE implements a configurable filter to help eliminate occasional artifacts, especially regarding the blue (z-axis) which can swing around quite a bit under some circumstances due to the pose ambiguity problem. The trick is to tune the filter to eliminate any residual pose jitter while maintaining adequate response to headset movement.

One challenge here is management of camera intrinsic parameters. In this case, I was using a Logitech C920 webcam for which calibration intrinsics had been determined using a version of the ChArUco calibration sample here. It wouldn’t be hard for the CUVCCam SPE to include camera intrinsic parameters in the JSON associated with each frame, assuming it could detect the type of UVC camera and pick up a pre-determined matrix for that type. Whether that’s adequate is a TBD. In other situations, where the video source is able to supply calibration data, the problem goes away. Anyway, more work needs to be done in this area.

Since rt-ai stream processing networks (SPNs) can be integrated with SHAPE via the Conductor SPE (an example of the Conductor is here), an AR headset running the SHAPE application could stream front facing video to the ArUco SPN, which would then return the relative pose of detected markers that have previously been associated with SHAPE virtual objects. This would allow the headset to correctly instantiate the SHAPE virtual objects in the space and avoid problems relying on inside out tracking alone (such as in a spatial environment with a repeating texture that prevents unique identification).