Modulation Recognition Using Deep Learning

In previous blog posts, we introduced the idea of using deep learning to detect chirp signals and others in degraded conditions using spectrogram images. Deep learning techniques can be powerful tools in the digital communications world because of their ability to extract features from data that may not be explicitly picked up using conventional signal processing. In this blog post, we'll talk about how to apply deep learning to modulation recognition, a challenging problem that has many applications in the digital communications world.

At KickView, we are always interested in exploring how we can use multiple software packages and frameworks to solve the problem at hand. While walking through the modulation recognition problem, we'll show how to create and train a deep neural network from scratch using Google TensorFlow.

Introduction to Analog & Digital Modulation

Modulation is simply the process of encoding the data you wish to send (e.g. digital bit stream, analog audio signal, etc.) inside another signal that can be transmitted via a physical medium. Put another way, modulation is the process of changing the properties of a periodic waveform (called a carrier signal) with a signal that contains the information you wish to transmit (called a modulating signal).

Radio broadcast systems commonly use Frequency Modulation (FM) or Amplitude Modulation (AM) to transmit content over-the-air. In AM, the amplitude of the carrier signal is varied with the instantaneous amplitude of the modulating signal. In FM, the concept is similar, except the frequency of the carrier signal is varied in accordance with the instantaneous amplitude of the modulating signal. AM and FM are analog modulation schemes, meaning that they transmit analog data streams over a channel at a specified frequency. There are many variants of AM and FM that are used to achieve different spectral properties and fit the requirements of the system at hand.

In contrast to analog modulation, digital modulation schemes modulate an analog carrier signal by a discrete modulating signal. In digital modulation systems, digital-to-analog conversion occurs at the transmitter side (because radio hardware can only transmit analog waveforms over a physical interface), and analog-to-digital conversion happens at the receiver side where the data is processed and decoded, or demodulated. Most digital modulation schemes are based on keying, where the values of the modulating signal are always one of a specific set of predetermined values at all times:

  • Phase Shift Keying (PSK) - uses a pre-defined set of phases
  • Frequency Shift Keying (FSK) - uses a pre-defined set of frequencies
  • Amplitude Shift Keying (ASK) - uses a pre-defined set of amplitudes

There are many types of analog and digital modulation schemes that are used in real communication systems. As a general rule of thumb, more complex modulation schemes have a higher data rate, but can be difficult to demodulate in noisy environments. Many real systems will adapt their modulation scheme on-the-fly to maximize throughput in their instantaneous RF environment. A good example is WiFi - your wireless router at home is constantly monitoring it's RF environment, changing it's modulation scheme and switching channels in order to maximize throughput.

The Modulation Recognition Problem

Now that we've introduced the concepts behind analog and digital modulation, let's talk about the modulation recognition problem. If you took a spectrum analyzer and walked around the Denver Tech Center (or any urban area for that matter), you would see a flurry of RF activity in almost every frequency band. Our RF environment today is incredibly congested due to the ubiquity of wireless devices of every kind. This poses many challenges for spectral situational awareness. Local governments might want to monitor RF activity to make sure that unlicensed transmitters are not operating in a particular region. Intelligence agencies might want to scan a region to look for a very particular threat, such as an IED in a combat zone or an adversary communicating via a push-to-talk radio. Simply put, modulation recognition is imperative to conduct spectral situational awareness operations.

Conventionally, this problem has been attacked by looking for features in signal data derived from advanced signal processing. We won't go into the details of these methods here, but some candidate approaches include template matching, cyclostationary processing and analysis of higher-order statistics. These methods can be very powerful, but often require custom fine-tuning to a specified signal set. They do not generalize well to dynamic RF environments.

Let's see if we can use deep learning to tackle the modulation recognition problem.

Dataset Generation and Pre-Processing

We generated a dataset of four different modulation schemes (AM, BPSK, FSK and FM) using GNURadio. We varied the signal-to-noise ratio from 0 to 18dB. We also introduced some random perturbations to the data using a simple channel model with randomized parameters. As a simple pre-processing step, we normalized the data to be between 0 and 1.

Important sidenote: A good rule of thumb with machine learning (and deep learning in particular) is that your algorithms can only ever be as good as your data. While our synthetic dataset works well for demonstration and experimentation, it is imperative to train, test and evaluate these approaches using collected real-world data in order for them to generalize well.

Before we get any further, let's take a look at some snippets of the data we generated (shown below in Figure 1). In a previous blog post, we looked at using spectrograms of chirp signals as time/frequency images. Now, let's view the complex I/Q samples as a image, where the first row is the in-phase component, and the second row is the quadrature component of the signal.

Figure 1: In-Phase/Quadrature Images of AM, FSK, BPSK and FM Modulation Types

Visually, it's easy to see that the signals look quite different from each other in this input space. By representing the input to our neural network as an I/Q image, we are inherently using both amplitude and phase information from each signal snapshot. This is quite a different input space than a traditional image, where only magnitude (e.g. grayscale values) is used. This visualization of a signal is also not something that's commonly done in traditional signal processing, where you might be looking at the power spectrum or the cyclo frequencies.

Before moving on, it's important to note that this is only one particular input space. At KickView, we are experimenting with various signal transformations and pre-processing steps to maximize performance in our deep learning algorithms.

Neural Network Training and Testing

With our I/Q signal images, we can use convolutional neural networks to pull out salient features. This is comparable to how we used a variation of AlexNet to classify chirp signals in spectrogram images. This time, let's create a neural network from scratch using Google TensorFlow. Note that the code snippets below are for illustration, and therefore won't run independently by themselves.

Let's build a neural network using TensorFlow that has a few convolutional layers, a few fully connected layers, and a readout Softmax layer for our four output classes:

# Create the RFNet
# RFData object - manages RF data access (not shown)
# Xform object - manages RF transformations (not shown)
RFNet = RFNN(RFData=RF,  

def build_net(RFNet):  
                            patch_size=[8, 2],
                            conv_strides=[1, 2, 2, 1],
                            ksize=[1, 2, 2, 1],
                            pool_strides=[1, 2, 2, 1],

                            patch_size=[4, 1],
                            conv_strides=[1, 1, 1, 1],




At KickView, we've created our own wrappers around TensorFlow to manage the graph and tensor connections between layers. This makes it easy for us to quickly experiment with different architectures.

Let's prepare our session for training and testing. As part of this process, let's define our minimization function tensor and our tensors for measuring performance of our network:

    def prepare(self, learning_rate=1e-4):
        Sets up our function to be minimized and initializes TensorFlow session

        # Define our local output tensor from the final layer
        self.y_out = self.layers[self.num_layers - 1].outputTensor

        # Set up minimization, training, testing & accuracy tensors
        diff = tf.nn.softmax_cross_entropy_with_logits(labels=self.y_, logits=self.y_out)
        self.cross_entropy = tf.reduce_mean(diff)

        self.train_step = tf.train.AdamOptimizer(learning_rate).minimize(self.cross_entropy)

        self.correct_prediction = tf.equal(tf.argmax(self.y_out, 1), tf.argmax(self.y_, 1))
        self.accuracy = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32))

        self.predictions = tf.argmax(self.y_out, 1)

        # Initialize graph and all variables

Now, we're ready to train and test on our dataset. Below is an example of what the training process looks like in TensorFlow:

    def train(self, batch_size=50):
        Loads in data in batches, and executes training in TensorFlow
        # Data batch training loop
        done = False
        batch = 0
        while not done:
            # Load in a batch of training data
            data_batch, label_batch, done = self.RFData.load_train_batch(batch_size, normalize=False)

            # Reshape the data and labels for TensorFlow processing
            train_data = np.reshape(data_batch, (len(data_batch), self.input_shape))
            train_labels = np.zeros((len(data_batch), self.num_output_classes))
            for idx, label in enumerate(label_batch):
                train_labels[idx, int(label)] = 1

            # Execute training step
            step, acc =[self.train_step, self.accuracy],
                                       feed_dict={self.x:  train_data,
                                                  self.y_: train_labels})

            print('Batch {b} - Training Acc: {acc}'.format(b=batch, acc=acc))
            batch += 1

After training and testing, how well did we do? In Figure 2 below, we show the confusion matrix for the overall dataset (full SNR range) as well as for the highest SNR case at 18 dB.

Figure 2: Modulation Confusion Matrices

The results are not perfect, but quite encouraging for a first pass! At high SNR (shown in the figure on the right), the classification rate is perfect. This is great news, because it means that there was enough discriminating information between these signal types in the I/Q image input space to create an accurate decision boundary. Most of the errors (reflected in the overall confusion matrix to the left) are at lower SNR closer to 0 dB, where conventional approaches also start to break down.


Hopefully this blog post provided you with some tools to get started with deep learning applied to RF signals using Google TensorFlow. There is lots of room to experiment with the types of transformations that can be applied to raw RF data before being passed into a deep neural network. As stated above, there are also many different options of architectures to try and a large hyper parameter space to explore. If you liked this blog post, please leave us a note! We are always open to hearing from people and collaborating with the greater machine learning and signal processing community.