Ham Radio Blog by AG1LE: Probabilistic Neural Network Classifier for Morse Code

One of the challenges in Morse Code decoder is how to build a well working adaptive classifier for incoming symbols given the variability in "dit/dah" timing in real world CW traffic received in ham bands. I have discussed these problems in my previous blog posting.

OVERVIEW

I have collected data from different ham QSOs in order to understand the underlying patterns and timing relationships. One good way to represent this data is to think Morse Code as "symbols" instead of "dits" and "dahs". A symbol is a ["tone", " silence"] duration pair. For example the letter A ( . - ) could be represented as two symbols ["dit" - "ele"] + ["dah", "chr"] where "dit" and "dah" represent short and long tone; respectively "ele" represent inter-element space and "chr" represents inter-character space. Morse code can then be represented as a sequence of symbols - letter "A" would be {S0, S4} - I am using the following definitions as shorthand for symbols:

S0 = [dit, ele] // dit and inter-element space
S1 = [dit, chr] // dit and inter-character space
S2 = [dit, wrd] // dit and inter-word space
S3 = [dah, ele] // dah and inter-element space
S4 = [dah, chr] // dah and inter-character space
S5 = [dah, wrd] // dah and inter-word space

CLASSIFICATION EXAMPLES

If we plot thousands of these symbols as data vectors in (x, y) chart where x-axis is tone duration ("mark") and y-axis is silence duration ("space") we get a picture like figure 1 below. "Dit" duration is scaled to 0.1 in the picture and "dah" duration is 0.3 respectively. Figure 1. was created from a recorded test audio file with -3 dB SNR by courtesy of Dave W1HKJ.

You can easily see different symbol clusters and there is clean separation between them. In the "mark" dimension (horizontal x-axis) "dit" and "dah" follow roughly 1:3 ratio as expected.

Dark blue and green areas represent S0 and S3 symbols - these are symbols within Morse letters as "space" value is around 0.1.

Red and brown areas represent S1 and S4 symbols - longer inter-character "space" value ~0.3 means that these symbols are the last symbol in a letter.

Yellow and light blue areas represent S2 and S5 symbols - long "space" value around 0.6 - 0.7 shows that these are the last symbol in a word.

Figure 1. Morse code symbols ( -3 dB SNR test audio file)

While figure 1. looks nice and neat please take a look at figure 2. below. Using the same exact classifier settings I recorded several stations working on 7 Mhz band and collected some 1800 symbols. This picture shows data from multiple stations - therefore the variation for each symbol is much larger.

There is much more variation in the "space" dimension. When listening the Morse code this is consistent with people having "thinking pauses" as they form an idea of the next words to send. Sometimes these pauses can be even longer - you can see those when a new subject is introduced in the QSO such as asking a question or responding to something that other station sent.
I have capped the duration to 1.0 ( this is 10 * "dit" duration) for implementation reasons. In real life these pauses can be few seconds.

I created another symbol S6 that I have used for noise spike detection. These random spikes are typically less than 1/2 "dit" duration. Having a separate symbol for noise makes it easier to remove them when decoding characters.

Figure 2. Morse code symbols (7 Mhz band ham traffic)

PNN AS A CLASSIFIER

Finding a classifier algorithm that would be able to identify each symbol in a consistent manner was not very easy. I used Weka software to test different algorithms trying to find a classifier that would classify the above symbols with over 90% accuracy. After spending many hours of running different data sets through Weka I was quite disappointed. None of the built-in classifiers seemed to handle the task well or I was not able to provide good enough parameters as a starting point.

I was exchanging some emails with Dave, N7AIG on how to build a classifier for Morse code symbols while he suggested that I should look at Probabilistic Neural Network algorithm.

I started reading articles on PNN and discovered that it has several great benefits. PNN is very fast in learning new probability distributions, in fact it is so called "one-pass" learning algorithm. This is a great benefit as many of the more traditional neural network algorithms require hundreds or even thousands of training cycles before they converge to a solution. With PNN you just have to show a few examples of each symbol class and it does the classification almost immediately.

PNN networks are also relatively insensitive to outliers. So few bad examples does not kill the classification performance. PNN networks also approach Bayes optimal classification and are guaranteed to converge to an optimal classifier as the size of the representative training set increases. Training samples can also be added or removed without extensive retraining.

PROTOTYPING WITH EXCEL

PNN has only one parameter "sigma" that needs tuning. I built an Excel sheet (copy in Google Sheets) to validate the PNN algorithm and to see how different "sigma" values impact the classification. By changing the "sigma" value I could easily see how the classifier handles overlapping distributions. Sigma basically determines the Gaussian shape of the PNN equation:

Figure 3. PNN equation

Experimenting with different "sigma" values I was able to discover that 0.03 seems to be a good compromise value. If you increase the value much higher you tend to get more classification errors due to overlaps. If you make it much smaller you get more classification errors due to too narrow distribution.

See figure 4. on how the probability distribution function (PDF) looks like for "dit", "dah" and "word space" with sigma = 0.03. The scale has been set so that "dit" duration corresponds to 0.1 like in the previous pictures.

Figure 4. PNN tuning

IMPLEMENTATION IN C++ FOR FLDIGI

The final part of this work was to create PNN algorithm in C++ and integrate it with FLDIGI software. The alpha version of the classifier code is in Appendix below - I have also included the Examples[] data I have used to classify results in Figure 1 and 2.

Figure 5. Probabilistic Neural Network

Figure 5. demonstrates the role of different "layers" in PNN algorithm. Input layer gets the symbol ("mark", "space") values that are scaled between 0.0 and 1.0. The pattern layers calculates the match against each symbol class training example. In this example there are two training examples per symbol class. The summation layer calculates the sum of probability distribution function (PDF - see Fig 3) for each class. Finally the decision layer compares all these results and selects the largest sum value that is the best matching symbol class.

Part of my current work has been also to completely rewrite the state machine in FLDIGI CW.CXX module that handles the conversion from KEY_DOWN and KEY_UP events to "mark" and "space" durations.

I have also rewritten the Morse decoder part - new decoder gets stream of classified symbols from PNN() function and runs through a state machine to come up with the correct character. I am still in process of getting this state machine to behave like a Viterbi decoder.

For this I would need the a-priori probabilities for each state and traverse through the states using the symbols and their probabilities to calculate the optimal decoding path. I am struggling a bit with this step currently - it is not obvious to me how I should use the constraints in Morse code to derive the best path. If you have experience in Viterbi decoders I would like to chat with you to get better insight how this algorithm should really work.

73
Mauri AG1LE

APPENDIX - C++ SOURCE CODE

#define Classes 7 // Symbols S0 ... S5 - 6 different classes + S6 for noise
#define NEPC 2 // Number of Examples per Class
#define Dimensions 2 // dimensions - use mark & space here - could add more features like AGC

const float Examples[Classes][NEPC][Dimensions] = {
{{0.1, 0.1}, // S0 = dit-ele
{0.08, 0.12}}, // S0 = dit-ele

{{0.1, 0.3}, // S1 = dit-chr
{0.08, 0.23}}, // S1 = dit-chr

{{0.1, 0.7}, // S2 = dit-wrd
{0.1, 0.6}}, // S2 = dit-wrd

{{0.3, 0.1}, // S3 = dah-ele
{0.34, 0.16}}, // S3 = dah-ele

{{0.34, 0.4}, // S4 = dah-chr
{0.29, 0.22 }}, // S4 = dah-chr

{{0.23, 0.54}, // S5 = dah-wrd
{0.23, 0.9}}, // S5 = dah-wrd

{{0.01, 0.01}, // S6 = noise-noise
{0.05, 0.05}} // S6 = noise-noise

};

//=======================================================================
// (C) Mauri Niininen AG1LE
// PNN() is a Probabilistic Neural Network algorithm to find best matching Symbol
// given received Mark / Space pair. Mark and Space are scaled to [0 ... 1.0] where
// Mark: standard 'dit' length is 0.1, 'dah' is 0.3
// Space: standard element space is 0.1, character space 0.3 and word space 0.7
// Examples[] contains 2 examples for each symbol class
//=======================================================================
int symbol::PNN (float mark, float space)
{
float sigma = 0.03; // SIGMA determines the Gaussian shape z = f(x,y) = exp (-((x-x0)^2+(y-y0)^2)/2*sigma^2)
int classify = -1;
float largest = 0;
float sum [Classes];
float test_example[2];

if (abs(mark) > 1) mark = 1;
if (abs(space) > 1) space = 1;

test_example[0] = mark;
test_example[1] = space;

// OUTPUT layer - computer PDF for each class c
for (int k=0; k < Classes; k++)
{
sum[k] = 0;

// SUMMATION layer - accumulated PDF for each example from particular class k
for (int i=0;i < NEPC; i++)
{
float product = 0;
// PATTERN layer - multiplies test examples by the weights
for (int j=0; j < Dimensions; j++)
{
product += (test_example[j] - Examples[k][i][j])*(test_example[j] - Examples[k][i][j]);
}
product = -product/(2*(sigma * sigma));
product= exp(product);
sum[k] += product;
}

sum[k] /= NEPC;
}
// sum[k] has accumulated PDF for each class k
for (int k=0; k<Classes; k++)
{
if ( sum[k] > largest)
{
largest = sum[k];
classify = k;
}

}

learn (classify, mark, space);

return classify;
}

4 comments:

John TarboxFebruary 6, 2013 at 12:18 PM
Mauri,

I would be interested to discussing the statistical processing with you.

John Tarbox WA1KLI
john (at) pimboat.com
Guenael, VA2GKAFebruary 10, 2013 at 8:51 PM
Really interesting job all along these last posts!
Neuronal networks works fine in my brain for CW decoding, so I guess they could work fine on computer :)
Hey, I bet you can write a more accurate CW decoder than VE3NEA, using a NN ;)

73,

Ham Radio Blog by AG1LE

Monday, February 4, 2013

Probabilistic Neural Network Classifier for Morse Code