Friday, September 27, 2013

New Morse Decoder - Part 1

NEW BAYESIAN MORSE DECODER

I have been working on a new Bayesian Morse decoder algorithm.  This work is based on Dr. Bell's  doctoral thesis  that is one of the best and most comprehensive documents on this topic I have found so far. While it contains a lot of advanced mathematics this thesis has also very thorough description of the problems related to automatically transcribing the hand-keyed manual Morse signal with acceptable error rate.  This thesis covers a lot of ground and describes with mathematical rigor how to model and solve each of the problems. It has also software examples (in Fortran)  and test results comparing to theoretically optimal solution as well as to human performance under different conditions.

Using the thesis as a starting point I implemented the algorithms in C language. Current version has some 3335 lines of code and is posted in Github as open source with verbal permission from Dr. Bell. I managed to get the software working to the degree that I am able to run some performance tests.  There is still a lot of work ahead to improve and clean up the code but I decided to publish some early results for those who might be interested in this work.

CORRELATOR-ESTIMATOR TESTING

As described in the thesis this algorithm represents a "correlator-estimator" technique in which a sequence of all possible keystate transitions are hypothesized and correlated with the incoming signal, and the most likely sequence is output as the best estimate.  To illustrate this point I plotted a segment of incoming signal with letters "QUICK B" as shown on figure 1. below.  The probability estimates of various keystates [P(dit), P(dah), P(el-spc), P(chr-spc), P(wrd-spc), P(pause)] are plotted underneath the incoming signal.  I used Kst Data Viewer to create this plot.


Figure 1.  Correlator-estimator probabilities




























The Morse audio file used in this test had Signal-Noise Ratio (SNR) of 8 dB @ 2 kHz bandwidth. This figure shows nicely the time variant probability values of each keystate, as well as how a particular keystate correlates with "mark"/"space" changes in the incoming signal.

CER vs. SNR  TESTING 

To test the decoder performance in the presence of noise I created a set of Morse sound files with -10 dB to +20 dB SNR @2kHz bandwidth using modified version of Rob Frohne KL7NA morse.m Octave software.  These files contain 200 words with 5 random letters and numbers each.  The sound files are available here: Morse sound files -10 dB ... 20dB SNR @2kHz BW  For algorithm testing I also created corresponding text files where is each sample is a real number on separate line. These are easier to manipulate and plot using standard Linux tools, like gnuplot.

I ran the algorithm using files with different SNR levels and saved the decoder text output. To get the Character Error Rate (CER) I created a small utility program to calculate Levenshtein  distance using this algorithm.  Comparing original text to decoded text gives the character error rate.  When plotting the results it was quite interesting to see a deep reduction in CER with SNR over 6 dB.  Note that SNR figure contains noise for the whole 2kHz bandwidth.  

Figure 2.  CER to SNR test results


































The CER should go down to zero as SNR improves - however the graph shows some base error rate ~ 3 %.  I need to study this a bit more in detail to find the root cause.  At higher noise levels the curve shape looks a bit closer to what I expected.

NEXT STEPS 

As mentioned before there is still a lot of work ahead to make this software useful.

I did some initial testing to integrate this decoder to a modified  FLDIGI package and got the software partially working as an external program connected to FLDIGI via a Linux FIFO pipe.

I am also trying to figure out automated test scripts for timing and speed variance testing.  I would really like to find the limits where the algorithm breaks. This requires more work in creating synthetic tests similar to what is described in the thesis.

Third  area of work is testing the algorithm performance against signal fading. This would provide limits on real world signals and help to optimize the model parameters.

If you are interested in advancing the state of the art in Morse decoding feel free to download the software, work on testing & improving it. Please provide your feedback and post comments below.

73
Mauri AG1LE




4 comments:

  1. Appreciate your work.
    73, Guido
    pe1nnz

    ReplyDelete
  2. Hi Mauri,

    i have some recordings from the AARL CW contest last weekend. Each is about 20kHz wide and 3 minutes long and contains a lot of CW signals.
    Just let me know if you are interested in that for testing your decoder.

    regards
    Mario

    ReplyDelete
    Replies
    1. Hi Mario
      Yes - I would be interested in your recordings as test material. I will send you a private email reply.
      73 Mauri

      Delete
    2. mario.dh5ym@gmail.comDecember 17, 2013 at 2:39 AM

      Hi Mauri,

      ok, once i have a email from you i will send you a information were you can download the snippets. Each is 15MByte and i also have a recording of about 192kHz in IQ format. But thats probably too much ;) The 20kHz recordings are in real format (mono file).
      Please send a mail to mario.dh5ym@gmail.com or dh5ym@darc.de

      regards
      Mario, DH5YM

      Delete