Adding ArUco marker detection to rt-ai

There are many situations where it is necessary to establish the spatial relationship between a camera in a space and 3D points within the same space. One particular application of interest is the ability to use markers to accurately locate holograms in a space so that AR headset users see the holograms locked in the space, even as they look or move around the space. OpenCV has the ArUco marker detection included so that seemed like a good place to start. The screen capture above shows the rt-ai Aruco marker detector identifying the pose of a few example markers.

This is the simple rt-ai test design with the new ArUcoDetect stream processing element (SPE). The UVC camera was running at 1920 x 1080, 30 fps, and the ArUco SPE had no trouble keeping up with this.

This screen capture is a demonstration of the kind of thing that might be useful to do in an AR application. The relative pose of the marker has been detected, allowing the marker to be replaced by an associated hologram by a 3D application.

While the detection is quite stable, the ArUco SPE implements a configurable filter to help eliminate occasional artifacts, especially regarding the blue (z-axis) which can swing around quite a bit under some circumstances due to the pose ambiguity problem. The trick is to tune the filter to eliminate any residual pose jitter while maintaining adequate response to headset movement.

One challenge here is management of camera intrinsic parameters. In this case, I was using a Logitech C920 webcam for which calibration intrinsics had been determined using a version of the ChArUco calibration sample here. It wouldn’t be hard for the CUVCCam SPE to include camera intrinsic parameters in the JSON associated with each frame, assuming it could detect the type of UVC camera and pick up a pre-determined matrix for that type. Whether that’s adequate is a TBD. In other situations, where the video source is able to supply calibration data, the problem goes away. Anyway, more work needs to be done in this area.

Since rt-ai stream processing networks (SPNs) can be integrated with SHAPE via the Conductor SPE (an example of the Conductor is here), an AR headset running the SHAPE application could stream front facing video to the ArUco SPN, which would then return the relative pose of detected markers that have previously been associated with SHAPE virtual objects. This would allow the headset to correctly instantiate the SHAPE virtual objects in the space and avoid problems relying on inside out tracking alone (such as in a spatial environment with a repeating texture that prevents unique identification).

Using shared memory for rt-ai inter-SPE transfers

The screen capture above couldn’t have been obtained previously as it is passing uncompressed (RGB888) video between rt-ai SPEs on the same node (a Jetson Nano in this case). The CVideoView window is showing the output of the simple network using the CSSDJetson SPE to classify objects and also computes the frames per second and latency of received frames. The source of the frames is a Logitech C920 webcam running at 1280 x 720, 30fps. It shows that the latency is around 128mS at around 15fps.

This screen capture shows what happens when shared memory isn’t used. Actually, the latency here is misleading as it seems to be the link from the CUVCCam SPE to the MQTT server that is causing the bottleneck when running uncompressed video. The latency goes monotonically upwards until there is no memory left as there is no throttling on that interface since normally it isn’t a problem.

There doesn’t seem to be much benefit when passing smaller messages between SPEs.

This screen capture above shows shared memory being used when transferring JPEG frames. The one below is with shared memory support turned off.

This just shows that bouncing off the MQTT server within the same node is pretty efficient, at least when compared to the latency of the inference.

Being able to pass large messages around efficiently, even if only point to point within the same node, is quite a step forward by itself. For example, it makes it practical to create networks that pass RGBD frames around.

Shared memory support in rt-ai2 uses the Qt QSharedMemory and QSystemSemaphore wrappers to make things simple. When a design is generated, rtaiDesigner determines if shared memory has been enabled for the network, if the publisher and subscriber are on the same node and if the connection is point to point (i.e. exactly one subscriber). If so, the publisher and subscriber SPEs are told to use shared memory instead of MQTT for that particular connection. The SPE configuration file for the publisher SPE also includes the shared memory slot size to use and how big the pending transmission queue should be. The system is set up at the moment to always use three shared memory slots forming a rotating buffer. The shared memory slots are created by the publisher and attached by the subscriber.

To minimize latency, every time the publisher places a new message in the next shared memory slot, it releases a QSystemSempahore to unblock a thread in the subscriber that can then extract the message, free the shared memory slot and process the received message.

This implementation of shared memory seems to work very well and is highly reliable. In principle, it could be extended to support multiple subscribers by replicating the shared memory slot structure for each subscriber.

Jetson Nano SSD-Mobilenet-v2 SPE for rt-ai

Following on from the earlier work with the Jetson Nano, the SSD-Mobilenet-v2 model is now running as an rt-ai Stream Processing Element (SPE) for Jetson and so is fully integrated with the rt-ai system. Custom models created using transfer learning can also be used – it’s just a case of setting the model name in the SPE’s configuration and placing the required model files on the rt-ai data server. Since models are automatically downloaded at runtime if necessary, it’s pretty trivial to change the model being used on an existing Stream Processing Network (SPN).

The screen capture above shows the rt-ai design that generated the implementation. Here I am using the UVCCam SPE so that the video is sourced from a webcam but any of the other rt-ai video sources (such as RTSPCam) could be used, simply by replacing the camera SPE in the design using the graphical editor – originally this design used RTSPCam in fact.

Using 1280 x 720 video frames, the SSDJetson SPE processes around 17fps. This is not bad but less than the 21fps achieved by the monolithic example code. The problem is that, in order to achieve the one to many and many to one, heterogeneous multi-node network graphical design capability, rt-ai currently uses MQTT brokers to move data and provide multicast as necessary. Even when the broker and the SPEs are running on the same node, it is obviously less efficient than pointer passing within monolithic code.

This “inefficiency of generality” isn’t really visible on powerful x86 machines but has an impact on devices like the Jetson Nano and Raspberry Pi. The solution to this is to recognize such local links and side-step the MQTT broker interface using shared memory. This optimization will be done automatically in rtaiDesigner when it generates the configurations for each SPE in an SPN, flagging appropriate nodes as sources or sinks of shared memory links when both source and sink SPEs reside on the same node.

Jetson Nano and rt-ai

The Jetson Nano is an obvious platform for rt-ai to support, to go with the existing Intel NCS2 and Coral edge platforms. One nice plus is that the Jetson Nano comes basically ready to go, all set up for inference.

The screen capture above shows the Nano running the detectnet-camera example code using a webcam as the source generating 1280 x 720 frames and SSD-Mobilenet-v2 as the model. Performance was not bad at 21fps running at 10W, 16fps running at 5W. The heatsink did get very hot in a pretty short space of time however!

Installing the rt-ai runtime was no problem at all and it was easy to utilize the H.264 accelerated pipeline in rt-ai’s RTSP camera capture module. The screen capture above shows this running along with a viewer, demonstrating basic rt-ai funtionality.

Next up is to roll the detection code into an rt-ai Stream Processing Element (SPE). This will generate identical metadata to the existing SSD detectors, allowing full compatibility between server GPU, Jetson, NCS 2 and Coral SSD detectors.

rt-ai will enable Intelligent Spaces

The idea of creating spaces that understand the needs of the people moving within them – Intelligent Spaces – has been a long term personal goal. Our ability today to create sensor data (video, audio, environmental etc) is incredible. Our ability to make practical use of this enormous body of data is minimal. The question is: how can ubiquitous sensing in a space be harnessed to make the space more functional for people within it?

rt-ai could be the basis of an answer to this question. It is designed to receive large volumes of multi-sensor data, extract meaningful information and then take control actions as necessary. This closes the local loop without requiring external cloud server interaction. This is important because creating a space with ubiquitous sensing raises all kinds of privacy issues. Because rt-ai keeps all raw data (such as video and audio) within the space, privacy is much less of a concern.

I believe that a key to making a space intelligent is to harness artificial intelligence concepts such as online learning of event sequences and anomaly detection. It is not practical for anyone to sit down and program a system to correctly recognize normal behavior in a space and what actions might be helpful as a result. Instead, the system needs to learn what is normal and develop strategies that might be helpful. Reinforcement via user feedback can be used to refine responses.

A trivial example would be someone moving through a dark space at night. It might be helpful to provide light at a suitable intensity to safely help a person navigate the space. The system could deduce this by having experienced other people moving though the space, turning on and off lights as they go. Meanwhile, face recognition could be employed to see if the person is known to the space and, if not, an assessment could be made if an alert needs to be generated. Finally, a video record could be put together of the person moving through the space, using assembled clips from all relevant cameras, and stored (on-site) for a time in case it is useful.

Well that’s a trivial example to describe but not at all trivial to implement. However, my goal is to see if AI techniques can be used to approach this level of functionality. In practical terms, this means developing a series of rt-ai modules using TensorFlow or similar to perform feature extraction, anomaly detection and sequence prediction that are then glued together with sensor and control modules to perform a complete system requiring minimal supervised training to perform useful functions.

rt-ai: real time stream processing and inference at the edge enables intelligent IoT

Real-time inference at the edge enables decision making in the local loop with low latency and no dependence on the cloud. rt-ai includes a flexible and intuitive infrastructure for joining together stream processing pipelines in distributed, restricted processing power environments. It is very easy for anyone to add new pipeline elements that fully integrate with rt-ai pipelines.

Edge processing and control is essential if there is to be scalable use of intelligent IoT. I believe that dumb IoT, where everything has to be sent to a cloud service for processing, is a broken and unscalable model. The bandwidth requirements alone of sending all the data back to a central point will rapidly become unworkable. Latency guarantees are difficult to impossible in this model. Two advantages of rt-ai (keeping raw data at the edge where it belongs and only upstreaming salient information to the cloud along with minimizing required CPU cycles in power constrained environments) are the keys to scalable intelligent IoT.

The ghost in the AI machine

The driveway monitoring system has been running full time for months now and it’s great to know if a vehicle or a person is moving on the driveway up to the house. The only bad thing is that it will give occasional false detections like the one above. This only happens at night and I guess there’s enough correct texture to trigger the “person” response with a very high confidence. Those white streaks might be rain or bugs being illuminated by the IR light. It also only seems to happen when the trash can is out for collection – it is in the frame about half way out from the center to the right.

It is well known that the image recognition capabilities of convolutional networks aren’t always exactly what they seem and this is a good example of the problem. Clearly, in this case, MobileNet feature detectors have detected things in small areas with a particular spatial relationship and added these together to come to the completely wrong conclusion. My problem is how to deal with these false detections. A couple of ideas come to mind. One is to use a different model in parallel and only generate an alert if both detect the same object at (roughly) the same place in the frame. Or instead of another CNN, use semantic segmentation to detect the object in a somewhat different way.

Whatever, it is a good practical demonstration of the fact that these simple neural networks don’t in any way understand what they are seeing. However, they can certainly be used as the basis of a more sophisticated system which adds higher level understanding to raw detections.

Object detection on the Raspberry Pi 4 with the Neural Compute Stick 2


Following on from the Coral USB experiment, the next step was to try it out with the NCS 2. Installation of OpenVINO on Raspbian Buster was straightforward. The rt-ai design was basically the same as for the Coral USB experiment but with the CoralSSD SPE replaced with the OpenVINO equivalent called CSSDPi. Both SPEs run ssd_mobilenet_v2_coco object detection.

Performance was pretty good – 17fps with 1280 x 720 frames. This is a little better than the Coral USB accelerator attained but then again the OpenVINO SPE is a C++ SPE while the Coral USB SPE is a Python SPE and image preparation and post processing takes its toll on performance. One day I am really going to use the C++ API to produce a new Coral USB SPE so that the two are on a level playing field. The raw inference time on the Coral USB accelerator is about 40mS or so meaning that there is plenty of opportunity for higher throughputs.

MobileNet SSD object detection using the Intel Neural Compute Stick 2 and a Raspberry Pi

I had successfully run ssd_mobilenet_v2_coco object detection using an Intel NCS2 running on an Ubuntu PC in the past but had not tried this using a Raspberry Pi running Raspbian as it was not supported at that time (if I remember correctly). Now, OpenVINO does run on Raspbian so I thought it would be fun to get this working on the Pi. The main task consisted of getting the CSSD rt-ai Stream Processing Element (SPE) compiling and running using Raspbian and its version of OpenVINO rather then the usual x86 64 Ubuntu system.

Compiled rt-ai SPEs use Qt so it was a case of putting together a different .pro qmake file to reflect the particular requirements of the Raspbian environment. Once I had sorted out the slight link command changes, the SPE crashed as soon as it tried to read in the model .xml file. I got stuck here for quite a long time until I realized that I was missing a compiler argument that meant that my binary was incompatible with the OpenVINO inference engine. This was fixed by adding the following line to the Raspbian .pro file:

QMAKE_CXXFLAGS += -march=armv7-a

Once that was added, the code worked perfectly. To test, I set up a simple rt-ai design:


For this test, the CSSDPi SPE was the only thing running on the Pi itself (rtai1), the other two SPEs were running on a PC (default). The incoming captured frames from the webcam to the CSSDPi SPE were 1280 x 720 at 30fps. The CSSDPi SPE was able to process 17 frames per second, not at all bad for a Raspberry Pi 3 model B! Incidentally, I had tried a similar setup using the Coral Edge TPU device and its version of the SSD SPE, CoralSSD, but the performance was nowhere near as good. One obvious difference is that CoralSSD is a Python SPE because, at that time, the C++ API was not documented. One day I may change this to a C++ SPE and then the comparison will be more representative.

Of course you can use multiple NCS 2s to get better performance if required although I haven’t tried this on the Pi as yet. Still, the same can be done with Coral with suitable code. In any case, rt-ai has the Scaler SPE that allows any number of edge inference devices on any number of hosts to be used together to accelerate processing of a single flow. I have to say, the ability to use rt-ai and rtaiDesigner to quickly deploy distributed stream processing networks to heterogeneous hosts is a lot of fun!

The motivation for all of this is to move from x86 processors with big GPUs to Raspberry Pis with edge inference accelerators to save power. The driveway project has been running for months now, heating up the basement very nicely. Moving from YOLOv3 on a GTX 1080 to MobileNet SSD and a Coral edge TPU saved about 60W, moving the entire thing from that system to the Raspberry Pi has probably saved a total of 80W or so.

This is the design now running full time on the Pi:


CPU utilization for the CSSDPi SPE is around 21% and it uses around 23% of the RAM. The raw output of the CSSDPi SPE is fed through a filter SPE that only outputs a message when a detection has passed certain criteria to avoid false alarms. Then, I get an email with a frame showing what triggered the system. The View module is really just for debugging – this is the kind of thing it displays:


The metadata displayed on the right is what the SSDFilter SPE uses to determine whether the detection should be reported or not. It requires a configurable number of sequential frames with a similar detection (e.g. car rather than something else) over a configurable confidence level before emitting a message. Then, it has a hold-off in case the detected object remains in the frame for a long time and, even then, requires a defined gap before that detection is re-armed. It seems to work pretty well.

One advantage of using CSSD rather than CYOLO as before is that, while I don’t get specific messages for things like a USPS van, it can detect a wider range of objects:


Currently the filter only accepts all the COCO vehicle classes and the person class while rejecting others, all in the interest of reducing false detection messages.

I had expected to need a Raspberry Pi 4 (mine is on its way 🙂 ) to get decent performance but clearly the Pi 3 is well able to cope with the help fo the NCS 2.

Detailed remote node status and distributed logging in rt-ai

Something that had been missing from rt-ai was the ability to see easily the state of remote nodes. That’s now been corrected with the new node status display. Each node has its own tab displaying key real-time usage information and the plan is to add more information to this in future versions – such as precisely which modules are running and also to support multiple GPUs.


Another missing feature was any sort of distributed logging, so that an app could receive and process log messages generated by any module in a design. This is now working and a module log display has been added to the design GUI in rtaiDesigner. As the logging system uses the same communications infrastructure as the rest of the management data, it will be easy to add redundant persistent storage and review of historic log information.

Just as an irrelevant footnote, the node status code came from the BioTestBench, something that I wrote some years ago when I was interested in bioinformatics. I was working on whole genome alignment and wanted a convenient single window that displayed all needed resource utilization information. Haven’t used this for ages but I am glad that some of the code finally came in handy for something else.