IR Facial Detection and Recognition

Facial detection and recognition is being integrated into devices we use everyday, from cell phones to ATM machines. What happens though, when the lights go out?

IR Facial Detection and Recognition

Facial detection and recognition is being integrated into devices we use everyday, from cell phones to ATM machines. Extensive facial datasets and innovative model structures have pushed the state of the art to almost human levels of accuracy in the visible domain. What happens though, when the lights go out?

Some devices are already starting to utilize sensors that see beyond the spectrum that humans are able to see. Home security systems and some cell phones utilize near-infrared light (NIR) to see in dark conditions; but even these sensors rely on light reflecting off of surfaces, even if that light is not viewable by the naked eye. When you have applications where producing any light is not an option, you must move to longer infrared (IR) wavelengths, where instead of relying on reflected light, you start to image the energy emitted directly from the objects.

Mid-wave IR (MWIR) and long-wave IR (LWIR), sometimes referred to as thermal infrared, collect the longer wavelength energy that is emitted from a source as radiated heat and thus require no external illumination. Due to these differences, the features that can be recognized using machine learning differ when compared to visible spectrum images (Figure 1). Individuals will look different after exercising versus sitting sedentary in a cold environment - as the radiated heat will fluctuate (e.g., hot vs cold nose will appear light or dark).

Unlike visible imagery, where many large public datasets are available for download, few datasets are available for the MWIR and LWIR domain. Our kvSonata real-time AI framework allows us to easily build capabilities into reusable modules, then string those modules together into processing pipelines we call workflows. We created a workflow that utilizes the strengths of the visible domain to automatically label data in the MWIR/LWIR domain. With this we are able to label MWIR/LWIR data without the need to drag and draw bounding boxes for hours on end.

Figure 1. LWIR (left) and visible (right) time synced imagery.

With some labeled data, we can train a network for facial detection. We started by experimenting with existing networks proven robust in the visible domain. However, our IR dataset was either too small, or the IR features were sufficiently different, and therefore the networks did not perform as expected, with many only reaching 65-75% validation accuracy before overfitting. Deciding we could revisit these networks again after our IR dataset grew, we ended up training a slightly modified Multi-Task Convolutional Neural Network (MTCNN)[1]. This network's design allows it to obtain usable accuracy and low false positives with relatively small amounts of training data.

Figure 2. MWIR faces ran through facial detection

With IR facial detection working (Figure 2.) and running as a reusable module in kvSonata, we started working on recognition. Our network needed to be able to determine the identity of individuals across modalities. This would enable a visible image to be used to recognize someone in zero light situations (e.g., verification).

Figure 3. Example of face verification between the visible and LWIR images. The differences between these modalities can be observed (e.g., hot vs cold nose, etc.)

We achieved good results using siamese networks which compare two input images, extracting features present in both visible and IR, and using those features to determine if the two input images contain the same face (Figure 3.).

Figure 4. Facial recognition workflow running on MWIR video

We are actively expanding our IR (MWIR & LWIR) facial datasets and improving our detection and recognition capabilities for specific applications. With the kvSonata real-time AI framework, the task of creating workflows is greatly simplified for both processing and generation of IR labeled data. This allows us to focus more time on structuring and training networks, building additional capabilities, and creating applications.