I have some new test results on the experimental FLDIGI code as well as some ideas how to improve SOM decoder.
NEW TEST RESULTS
I did a new comparison test with a FLDIGI patch on 3.22RC that Dave W1HKJ provided.
Dave is saying:
" I completed the fast-convolution-fft implementation of the matched filter for
the CW decoder. Also changed the decimate in time processing in the main Rx
process loop. No special attention is needed for audio stream replay. This
version of your mod works very well and is very cpu efficient. The combination
of matched filter and SOM decoding give excellent results!"
The 9 test wav files with s/n ranging from -3 dB to +6 dB used are in here and Dave's test results are in here . I added the test audio files in a playlist ordered from -3dB to +3dB signal to noise ratio (SNR) and played them in that order with AlsaPlayer.
To demonstrate the improvements I took two screen captures of the experimental software.
Figure 1 shows the new experimental matched filter and SOM decoder features enabled. The new features enable CW decoding from -3dB SNR upward relatively error free.
|Figure 1. Matched Filter and SOM decoder enabled|
Figure 2 shows the new experimental matched filter and SOM decoder features disabled. The legacy CW decoder is relatively error free from +1dB SNR upwards.
|Figure 2. Matched Filter and SOM decoder disabled|
IMPROVEMENT IDEAS FOR SOM DECODERThere has been also discussion in Linuxham mailing list on the SOM decoder.
Several people have shared ideas on how we could improve CW recognition / decoding using different approaches, like Hidden Markov Models and Scale Invariant Transforms.
My focus in this work has been to improve CW decoding in the presence of noise. However, there are multiple other problems to solve in the CW decoding space. Here are some examples:
- Decoding CW from those irregular fists and all the bug users that have very short dits and long dahs, way different than the standard 1:3 dit:dah timing ratio
- Decoding variable speed CW - I have heard stations who give other station's call sign manually at 15 WPM and give rest of the message from elbug memory at 25 WPM. For human operators this may be OK if the message is standard exhange like RST and serial number. However, for decoding software this rapidly varying speed may be a problem.
- Decoding multiple CW stations simultaneously - all with different SNR, speed and timing.
I added some debugging code to FLDIGI to be able to extract the timing of decoded characters. Figure 3 below shows the distribution of "dits" and "dahs" taken from the above 9 computer generated noisy wav files. The speed in the above example is 24 WPM so "dit" length should be 1.2/24 = 50 ms. Accordingly "dah" length should be 150 ms.
As you can see from figure 3 noise creates jitter in timing. The sample size was only 267 decoded characters but you can already observe a bell shaped normal distribution being formed below.
|Figure 3. Dit/Dah timing histogram (horizontal axis in milliseconds).|
You can also see some outliers around 250 ms and 350 ms area. We have 4/267 outliers in 250 ms range and 2/267 in 350 ms range. We are talking about 6/267 cases so approximately 2.2% of incorrectly decoded characters due to noise in this sample of 267 characters.
When looking at these outliers I found letter Q misinterpreted as W in the first CQ sequence:
== duration (ms)==== CHAR164 62 158 60 0 0 0 C // dah dit dah dit = "C"
156 172 258 0 0 0 0 W // dit dah dah = "W" (should be "Q")
SOM algorithm looked at this pattern and tried to find best match to what looks like dit - dah - dah sequence (W) after normalization. However, what apparently happened here was that letter Q ( dah - dah - dit - dah) got scrambled by noise and the last dit - dah was merged to one extra long dah with 258 ms duration.
Below is another example. W1HKJ was misdecoded as WFHKJ. When looking at the details it looks like a noise spike merged two 'dahs' into one extra long 'dah' with 360 ms duration.
== dit/dah duration (ms) = CHAR
70 164 154 0 0 0 0 W
58 152 360 158 0 0 0 F // dit-dit-dah-dit = "F" (should be "1")
62 56 56 50 0 0 0 H
162 76 160 0 0 0 0 K
70 158 158 170 0 0 0 J
These examples above give some hints how to improve SOM decoder even further. Right now SOM decoder treats every character separately. After normalization it just tries to find "Best Matching Unit" (BMU) based on input vector (dit/dah durations) by calculating Euclidian distance to codebook entries. The codebook entry with minimum distance is the "winner". In the current version the SOM codebook is fixed and does not use SOM learning algoritms.
OUTLIER PROCESSING IDEA
We could maintain a table that learns the frequency (or probability) of dit and dah durations. If the incoming pattern of dits and dahs fits under the normal distribution we can proceed to BMU algorithm and find the winner. However, if the incoming pattern has an outlier we need to figure out most likely alternative before we pass the pattern to BMU.
For example in the above case 1 - 258 ms is an outlier. Since dit time mean is 50 ms ( ~ space = 50ms, ~ dah = 150 ms) we have the following potential alternatives to replace the outlier:
Alt1: dit - dah = 50ms + 50ms + 150ms = 250 ms
Alt2: dah - dit = 150ms + 50ms + 50 ms = 250 msAlt3: dit - dit - dit = 5 x 50 ms = 250 ms
Input: 156 172 258 0 0 0 0 // 258 is outlier, try alternatives
Alt1: 156 172 50 150 0 0 // = "Q"
Alt2: 156 172 150 50 0 0 // = not in codebook
Alt3: 156 172 50 50 50 0 // = "7"
In the above case 2 - 360 ms is an outlier. We have the following potential alternatives
Alt1: dah - dah = 150ms + 50ms + 150ms = 350 ms
Alt2: dit - dit - dah = 50 + 50 + 50 + 50 + 150 = 350 ms
Alt3: dah - dit - dit = 150 + 50 + 50 + 50 + 50 = 350 ms
Input: 58 152 360 158 0 0 0 // 360 is outlier, try alternativesAlt1: 58 152 150 150 158 0 0 // = "1"
Alt1: 58 152 50 50 150 158 0 // = not in codebook
Alt2: 58 152 150 50 50 158 0 // = not in codebook
These alternatives could be arranged in some sort of probability order. I am not sure how to measure likelyhood of one noise spike vs. multiple noise spikes impacting the same character. Intuitively one noise spike would be more probably but this depends on morse speed, noise type and other factors I assume.
Alternatively, if we know typical outlier alternatives we can just add the most likely cases in the SOM codebook and let the BMU algorithm to match incoming vector (with outlier) with the SOM codebook. This makes the processing really simple and focus shifts to maintaining an up-to-date SOM codebook that includes error handling cases. We just need to collect enough data to learn about most typical error cases. This could be done also via some kind of web service where FLDIGI clients send error cases with enough context and in return will get updated SOM codebook. This way FLDIGI community would improve the decoder & codebook by exposing the SOM algorithm to a large variety of real world cases.
Update May 27, 2012:
I added a new histogram collecting function in the case CW_KEYUP_EVENT: section of cw::handle_event() function. I also added a new histogram display to FLDIGI to visualize cumulative distribution of detected signals. As the program runs and detects CW signals it builds a table of possible "dit" and "dah" durations. Figure 4 below shows an example distribution collected from over 20 minutes period. Notice that current experimental prototype does not include logic to accomodate Morse speed changes - it uses the same cumulative distribution for all stations. We should zero the distribution table every time we change the station we are listening, if the speed varies a lot.
|Figure 4. Cumulative distribution of "dit" and "dah" durations.|