I have been trying out a TensorFlow application called DeepLab that uses deep convolutional neural nets (DCNNs) along with some other techniques to segment images into meaningful objects and than label what they are. Using a script included in the DeepLab GitHub repo, the Pascal VOC 2012 dataset is used to train and evaluate the model. One of the results is shown above. It has managed to extract some pretty ugly furniture from a noisy background quite nicely. Here are couple more examples:
Incidentally, to get the local_test.sh to work on Ubuntu 16.04 I had to change the call to download_and_convert_voc2012.sh to use bash instead of sh otherwise it generated an error. Also, I needed to install cuDNN 7.0.4 for Cuda 9.0 rather than cuDNN 7.1.1 in order to get the Jupyter notebook example operating.
What I would like to do now is to create an rt-ai Edge Stream Processing Element (SPE) based on this code to act as a preprocessor stage in order to isolate and identify salient objects in a video stream in real time. One of my interests is understanding behaviors from video and this could be a valuable component in that pipeline by allowing later stages to focus on what’s important in each frame.