Extreme edge depth video processing: Intel L515 LiDAR + Raspberry Pi and Stereolabs ZED + Jetson Nano

Depth cameras are an important component of rt-ispace but things just aren’t going to scale if each one needs a server with GPU just to generate useful data. This means that the extreme edge, consisting of hopefully low cost components that can be widely distributed, needs to be able to interface to depth cameras and make the data available to the wider network.

I have been testing with a Stereolabs ZED camera connected to a Jetson Nano and an Intel L515 LiDAR connected to a Raspberry Pi 4. The depth video stream generated by the rt-ai capture code is 1280 x 720 pixels, JPEG encoded along with uncompressed 640 x 360 16 bit depth data with a target frame rate of 15fps. Both systems seem quite capable of capturing and transmitting the data streams as shown above. The rt-ai design being used is this:

The rtai0 node is the Raspberry Pi 4. In this case, the depth video streams from the cameras are not being processed further on the extreme edge systems. The depth data views, which display data coming directly from the extreme edge systems, show that the Jetson Nano is generating frames at the target rate while the Raspberry Pi 4 is achieving about half that. Both are usable rates for many applications.

The depth video frames are also passed to the OpenPoseGPU Stream processing Element (SPE). This is an implementation of OpenPose that uses the desktop GPU (a GTX1080 Ti) to implement pose estimation. The OpenPoseGPU SPE can work with standard video streams but if given a depth video stream will work out the depth of each identified joint and add that to the metadata generated.

The total throughput of the OpenPoseGPU SPE is around 14fps. As can be seen in the rt-ai design, the depth video streams are multiplexed into the OpenPoseGPU SPE so that this capability is being shared between the two streams. The FanOut SPE separates the output streams which are then sent to viewers. Due to the limited throughput of the OpenPoseGPU SPE, the data streams are reduced in frame rate by a factor of 2.

So this design, where OpenPose processing is offloaded from the extreme edge, works fine but it would be far more interesting to do this at the extreme edge.

The screen capture above shows pose estimation running at the extreme edge using an Intel NCS 2 to run inference as implemented by the OpenPoseVINO SPE on the rtai0 Raspberry Pi 4 node. This does work pretty well but can only achieve 2fps. This might be ok for some applications but it would be nice to get to around 10fps.

I did also try running trt_pose on the Jetson Nano to try extreme edge pose estimation there but this was not successful. It may be that trying to run the ZED camera and trt_pose on the same Nano is just asking too much. Moving to a Xavier NX would probably make sense as it has double the memory and more power in general but it is a fair bit more expensive that the Nano so somewhat less relevant to the extreme edge application.

Work is now moving onto a new architecture using distributed inference to relieve the load on the extreme edge while still achieving usable pose estimation frame rates.