Sunday, May 20, 2012

FLDIGI: Matched Filter and SOM decoder - new ideas

In my previous blog entry I described experiments adding Self Organized Maps (SOM)  based Morse code decoder into FLDIGI.  

I have some new test results on the experimental FLDIGI code  as well as some ideas how to improve SOM decoder.


I did a new comparison test with a FLDIGI patch on 3.22RC that Dave W1HKJ provided.

Dave is saying:
" I completed the fast-convolution-fft implementation of the matched filter for
the CW decoder. Also changed the decimate in time processing in the main Rx
process loop. No special attention is needed for audio stream replay. This
version of your mod works very well and is very cpu efficient. The combination
of matched filter and SOM decoding give excellent results!"

The  9 test wav files with s/n ranging from -3 dB to +6 dB used are in here  and Dave's test results are in here .  I added the test audio files in a playlist ordered from -3dB to +3dB signal to noise ratio (SNR)  and played them in that order with AlsaPlayer.

To demonstrate the improvements I took two screen captures of the experimental software.

Figure 1 shows the new experimental matched filter and SOM decoder features enabled.   The new features enable CW decoding from -3dB SNR upward relatively error free.

Figure 1. Matched Filter and SOM decoder enabled

Figure 2 shows the new experimental matched filter and SOM decoder features disabled. The legacy CW decoder is relatively error free  from +1dB  SNR upwards.
Figure 2. Matched Filter and SOM decoder disabled


There has been also discussion in Linuxham mailing list on the SOM decoder.
Several people have shared ideas on how we could improve CW recognition / decoding using different approaches, like  Hidden Markov Models   and Scale Invariant Transforms.

My focus in this work has been to improve CW decoding in the presence of noise. However, there are multiple other problems to solve in the CW decoding space. Here are some examples: 

  • Decoding CW from those irregular fists and all the bug users that have very short dits and long dahs, way different than the standard 1:3  dit:dah timing ratio
  • Decoding variable speed  CW  - I have heard stations who give other station's call sign manually at 15 WPM  and  give rest of the message from elbug memory at 25 WPM. For human operators this may be OK if the message is standard exhange like RST and serial number. However, for decoding software this rapidly varying speed may be a problem.
  • Decoding multiple CW stations simultaneously - all with different SNR, speed and timing.
The timing problem is obviously also quite important as there are still many hams working CW manually and timing errors can also be generated by noise, operating practices etc.

I added some debugging code to FLDIGI to be able to extract the timing of decoded characters.  Figure 3 below shows the distribution of "dits" and "dahs" taken from the above 9 computer generated noisy wav files.  The speed in the above example is 24 WPM  so  "dit" length should be 1.2/24 = 50 ms.  Accordingly  "dah" length should be 150 ms.

As you can see from figure 3   noise creates jitter in timing.  The sample size was only 267 decoded characters but you can already observe a bell shaped normal distribution being formed below.

Figure 3.  Dit/Dah timing histogram (horizontal axis in milliseconds). 

You can also see some outliers around 250 ms and 350 ms area. We have  4/267 outliers in 250 ms range and 2/267 in 350 ms range.  We are talking about  6/267  cases so approximately   2.2%  of incorrectly decoded characters due to noise in this sample of 267 characters.

 When looking at these outliers I found  letter Q misinterpreted as W  in the first CQ  sequence:

Case 1:
== duration (ms)==== CHAR
164 62  158 60 0 0 0 C     //  dah dit dah dit = "C"
156 172 258 0  0 0 0 W     //  dit dah dah   = "W" (should be "Q")

SOM algorithm looked at this pattern and tried to find best match to what looks like   dit - dah - dah  sequence (W) after normalization.  However,   what apparently happened here was that  letter  Q  ( dah - dah - dit - dah)  got scrambled by noise and the last dit - dah was merged to one extra long dah with 258 ms duration. 

Below is another example.  W1HKJ  was misdecoded as  WFHKJ. When looking at the details it looks like a noise spike merged two 'dahs' into one extra long  'dah' with 360 ms duration.

Case 2:
== dit/dah duration (ms) = CHAR
70  164 154 0   0 0 0 W
58  152 360 158 0 0 0 F       //  dit-dit-dah-dit   = "F" (should be "1")
62  56  56  50  0 0 0 H
162 76  160 0   0 0 0 K
70  158 158 170 0 0 0 J

These examples above give some hints how to improve SOM decoder even further.  Right now  SOM decoder treats every character separately.  After normalization it just tries to find "Best Matching Unit"  (BMU) based on input vector  (dit/dah durations) by calculating Euclidian distance to codebook entries.  The codebook entry with minimum distance is the "winner". In the current version the SOM codebook is fixed and does not use SOM learning algoritms.


We could maintain a table that learns the  frequency (or probability) of dit and dah durations.  If the incoming pattern of dits and dahs  fits under the normal distribution  we can proceed to BMU algorithm and find the winner.  However,  if the incoming pattern has an outlier  we need to figure out most likely alternative before we pass the pattern to BMU.

For example in the above case 1  - 258 ms is an outlier.  Since dit time mean is 50 ms  ( ~ space = 50ms, ~ dah = 150 ms) we have the following potential alternatives to replace the outlier:

Alt1: dit - dah         =  50ms + 50ms + 150ms  = 250 ms 
Alt2: dah - dit         = 150ms + 50ms + 50 ms  = 250 ms 
Alt3: dit - dit - dit   =  5 x 50 ms = 250 ms

Input: 156 172 258 0  0 0 0   // 258 is outlier, try alternatives 
Alt1:  156 172 50 150 0 0     //  = "Q"
Alt2:  156 172 150 50 0 0     //  = not in codebook
Alt3:  156 172 50 50 50 0     // = "7"

In the above case 2  -  360 ms is an outlier.  We have the following potential alternatives

Alt1:  dah - dah        =  150ms + 50ms + 150ms  = 350 ms
Alt2:  dit - dit - dah  =  50 + 50 + 50 + 50 + 150 = 350 ms 
Alt3:  dah - dit - dit  =  150 + 50 + 50 + 50 + 50 = 350 ms 

Input: 58  152 360 158 0   0   0   //  360 is outlier, try alternatives
Alt1:  58  152 150 150 158 0   0   // = "1"    
Alt1:  58  152 50  50  150 158 0   // = not in codebook
Alt2:  58  152 150 50  50  158 0   // = not in codebook  

 These alternatives could be arranged in some sort of probability order. I am not sure how to measure likelyhood of one noise spike  vs. multiple noise spikes impacting the same  character. Intuitively one noise spike would be more probably but this depends on morse speed, noise type and other factors I assume.

Alternatively, if we know typical outlier alternatives we can just add the most likely cases in the SOM codebook and let the BMU algorithm to match incoming vector (with outlier)  with the SOM codebook.  This makes the processing really simple and focus shifts to maintaining an up-to-date  SOM  codebook that includes error handling cases.  We just need to collect enough data to learn about most typical  error cases.  This could be done also via some kind of web service where FLDIGI clients send error cases with enough context  and in return will get updated SOM codebook.  This way  FLDIGI community would improve the decoder & codebook by exposing the SOM algorithm to a large variety of real world cases.

Update May 27, 2012:  
 I added a new histogram collecting function in the case CW_KEYUP_EVENT:  section of  cw::handle_event() function.  I also added a new histogram display to FLDIGI  to visualize cumulative distribution of detected signals. As the program runs and detects CW signals it builds a table of possible "dit" and "dah" durations.  Figure 4 below shows an example distribution collected from over 20 minutes period. Notice that current experimental prototype does not include logic  to accomodate Morse speed changes - it uses the same cumulative distribution for all stations. We should zero the distribution table every time we change the station we are listening, if the speed varies a lot.

Figure 4.  Cumulative distribution of "dit" and "dah" durations. 

Mauri  AG1LE

Popular Posts