No GPU Required: Real-Time Inference with Optimized Networks in kvSonata

No GPU Required: Real-Time Inference with Optimized Networks in kvSonata

The rise of deep learning ran in parallel with advancements in GPU technology. As GPUs got faster, mostly fueled by the high demand for powerful graphics cards in the world of gaming, researchers started to train the high-capacity neural networks that quickly evolved into the well known CNN architectures of today. However, fast GPUs are still expensive and power-greedy, and serve as a barrier to entry for many would-be AI ventures. Fortunately, there are now effective alternatives to using GPUs for inference in video analytics applications. These involve using software to optimize trained networks for other devices such as CPUs, VPUs, or FPGAs.   In this blog post we will show how Intel’s OpenVino SDK can be used to optimize models. Furthermore, we show how OpenVino's inference engine is easily integrated into the KickView's kvSonata framework for effective real time inference applications without leveraging an expensive a GPU!

OpenVino is Intel’s deep learning inference acceleration package, specifically targeted at computer vision applications. The SDK includes tools for optimizing and running models on CPUs, GPUs, FPGAs, and the Intel Neural Compute Stick 2 (VPUs, which we will talk about later in this post). OpenVino is essentially Intel’s answer to NVIDIA’s TensorRT package. At KickView, we have spent lots of time optimizing and deploying models with TensorRT, so we were very excited to try out OpenVino and see how it compares. We were also excited at the prospect of being able to run our powerful computer vision models without a GPU, while still hopefully maintaining real-time speed requirements. As we will show at the end of this post, we were able to achieve surprisingly good results when applying OpenVino to some of our object detection models.

Figure 1: Diagram of the OpenVino acceleration process (borrowed from The green boxes correspond to our internal training and kvSonata deployment processes. The blue boxes represent the OpenVino SDK.

At KickView, our goal is not only to produce fast and accurate models for effective applications, but to enable our customers to quickly build and deploy their own AI applications and solutions. As we have discussed in previous blog posts, kvSonata is a hardware-agnostic framework. In addition to TensorRT support into the native kvSonata environment, the latest release also includes native support for the OpenVino SDK. We also include examples object detection modules in Python and C++. Therefore, from the user’s perspective, running your model on the CPU, or on an edge device such as Intel NCS2’s VPU (“vision processing unit”), is as simple as following these four steps:

  1. Use the OpenVino SDK (which is preinstalled in the sonata-dev docker image) to optimize your model into an Intermediate Representation (IR) file. Intel provides a number of very intuitive examples of how to do this, see
  2. Select and configure a kvSonata module that uses the SDK. For many detection tasks, you may be able to use one of the examples provided out of the box.
  3. Create a workflow definition YAML file that includes the OpenVino detection module.
  4. Run kvSonata.
Figure 2: Example of converting a TensorFlow detection workflow on the GPU into an identical OpenVino optimized detection workflow on the CPU.

Once you develop some familiarity with optimizing models via the OpenVino SDK, the process of changing a TensorFlow detection workflow (GPU) into an OpenVino workflow (CPU or VPU) is straightforward and simple. Furthermore, because the workflows are structurally identical you now have the flexibility to swap the modules at will during the process of building your application. You can also get more creative by running multiple detection modules, each on a different device. With the included fan-out and fan-in modules, it is easy to split data across multiple devices.

Figure 3: kvSonata workflow that uses four Neural Compute Sticks stacked together to perform inference in parallel. kvSonata makes it easy to use as many, or as few, devices as needed for your particular workflow.

In the process of integrating OpenVino into kvSonata, we had the opportunity to run a variety of speed tests on some of the model architectures that we frequently use in our own projects. We found that OpenVino acceleration yields a dramatic speed boost when running on a CPU. Speeds recorded were up to 6x faster. We also found that many models can run sufficiently fast on the NCS2 to allow for real-time object detection at the edge. At less than 1/5 the cost, and with a powerful SDK behind it, clearly the NCS2 is a strong rival for NVIDIA's Jetson TX2 and other GPU-based embedded devices (however NVIDIA does have its own device in this price range, the Jetson Nano, which we have not yet tested). The figure below shows speed test results on two of the most common detection architectures, SSD and Faster-RCNN, when run on various devices with and without OpenVino optimization.    

Figure 4: Results of speed tests of two common detection networks on various devices, with and without OpenVino optimization.

We are excited about the recent advances in hardware that continue to provide options for deploying state-of-the-art intelligent video applications. KickView's real-time video analytics framework, kvSonata, will help you take advantage of all these new technologies (both GPU and non-GPU), while providing and easy-to-use platform that enables you to the most powerful algorithms for edge computing. Contact us to learn more.