Wednesday, September 3, 2014

Morse Learning Machine - Challenge

MACHINE LEARNING CHALLENGE

I was astonished to get email acknowledgement that my  Kaggle Morse Challenge was approved today. I have spent last few days preparing materials and editing the description and the rules for this competition.

The goal of this competition is to build a machine that learns how to decode audio files containing Morse code.


For humans it takes many months effort to learn Morse code and after years of practice the most proficient operators can decode Morse code up to 60 words per minute or even beyond. Humans have also extraordinary ability to quickly adapt to varying conditions, speed and rhythm.

I want to find out if it is possible to create a machine learning algorithm that exceeds human performance and adaptability in Morse decoding.  I have shared some of these ideas in New England Artificial Intelligence meetup about one year ago and got enthusiastic feedback from the participants.




WHY KAGGLE?   

Kaggle is a platform for predictive modelling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know at the outset which technique or analyst will be most effective. Kaggle aims at making data science a sport.

Kaggle's community of data scientists comprises tens of thousands of PhDs from quantitative fields such as computer science, statistics, econometrics, maths and physics, and industries such as insurance, finance, science, and technology. They come from over 100 countries and 200 universities. In addition to the prize money and data, they use Kaggle to meet, learn, network and collaborate with experts from related fields.

For the Morse Learning Machine competition I hope to attract people from the Kaggle community who are interested in solving new, difficult challenges using their predictive data modeling, computer science and machine learning expertise.  For students this challenge provides a great opportunity to put theoretical concepts into practice and see how they can solve tough problems by applying knowledge gained in class rooms.


COMPETITION DETAILS


During the competition, the participants build a learning system capable of decoding Morse code. To that end, they get development data consisting of 200 .WAV audio files containing short sequences of randomized Morse code. The data labels are provided for a training set so the participants can self-evaluate their systems. To evaluate their progress and compare themselves with others, they can submit their prediction results on-line to get immediate feedback. A real-time leaderboard shows participants their current standing based on their validation set predictions.

I have also provided  sample Python Morse decoder  to make it easier too get started. While this software is purely experimental version it has some features of the FLDIGI Morse decoder   but implemented using Python instead of C++.

You can of course  leverage the experimental multichannel CW decoder I recently implemented on FLDIGI or the standalone version of Bayesian decoder written in C++.  There is also some new tools I posted to Github.

Please help me to spread this message to attract participants for the Morse Learning Machine challenge!

73
Mauri AG1LE





9 comments:

  1. Dan KB6NU has posted a story about Morse Learning Machine Challenge! http://www.kb6nu.com/ag1le-challenges-developers-to-come-up-with-better-morse-code-reader/

    ReplyDelete
  2. Bas PE4BAS has published a story about Morse Learning Machine Challenge! http://pe4bas.blogspot.nl/2014/09/morse-learning-machine-challenge.html

    ReplyDelete
  3. ARRL headline news this morning!

    http://www.arrl.org/news/morse-learning-machine-challenge-catching-on-with-hams

    ReplyDelete
  4. I like the idea, but not the environment. Randomized CW text? Humans do not decode randomized text. The context is part of the decoding process...ie words and phrases in some agreed-upon language. Teaching machines to recognize long and short beeps is trivial and not novel.

    ReplyDelete
  5. I agree that Morse code with natural language text content would add more challenge to this competition. It would require some NLP techniques to be applied to decoded symbol stream.

    In fact I was considering this but wanted to focus on the MLM v1 challenge to basics - there are only two variables in this version 1 contest that participants need fiigure out a solution for, namely
    - Signal-to-Noise Ratio (SNR varies from -12 dB to +20 dB)
    - Speed (varies from 12 to 60 WPM)

    > Teaching machines to recognize long and short beeps is trivial and not novel.

    I haven't seen yet anybody to claim perfect score 0.0 in this challenge yet. The SNR / WPM combinations were selected so that it would not be trivial to to learn perfect combination. You need to have some filtering but the bandwidth depends on Morse speed. For higher speed (60 WPM) you need a wider bandwidth filter which adds noise, and noise will impact your decoding accuracy. Having Morse code at different speed and different SNR requires some adaptive algorithm that optimizes filtering for decoding accuracy. Speed adaptation is another challenge that sounds simple but often adaptive algorithms miss first few characters during a rapid speed change.

    While this sounds trivial and not a novel exercise it is still a pre-requisite for more complex optimization cases. In the subsequent challenges we can add more variables more aligned with real life Morse signals, such as

    - variable rythm/speed within a single character sequence set
    - rapidly variable signal amplitude as impacted by propagation effects
    - variable frequency as impacted by doppler, ionospheric or other effects
    - variable vocabulary of words

    If you claim that this is a trivial exercise please participate the challenge and provide your solution and make it open source. With that solution we can then move to next level of challenge.

    Thanks for comments!
    73
    Mauri AG1LE


    ReplyDelete
  6. Back in the early 1990's, I had discovered a CW decoding program for my VIC 20 (or Commodore 64). It was a 6502 machine language routine (well under 1K bytes!) that ran on an 8-bit Commodore machine to decode Morse! (I got it either from a BBS, or typed it in from a magazine.)

    It had a VERY NICE adaptive algorithm, and would decode incoming Morse, even with significant (human-sent) speed variations! I even put it to the test by deliberately varying my speed of "sending" into it. As long as there was a reasonable distinction between the dits and dahs, it nailed it! It used the SHIFT key as its input. This made it EASY to interface, because the VIC20 (and Commodore 64) used a physical switch for shift-lock which had exposed wires.

    I built a basic op-amp audio bandpass and comparator circuit, to
    convert the incoming CW tones into switch contact closures. This Rube Goldberg worked EXTREMELY WELL at decoding incoming CW on the ham bands... even the sloppily sent Code from beginners... like me!

    Static, obviously, was a problem with such a simple audio bandpass-based receiving system... but "FFT" and "DSP" software weren't even pipe dreams in those days! ;)

    I lost that little utility YEARS ago... and I hope I can find it again,
    on some dusty old Commodore floppy disk! (I just have to seriously commit to the search!) Obviously, this doesn't meet the criteria set forth, here, but I thought it was interesting enough to mention. :)

    73!

    Willie...
    N1NKM

    ReplyDelete
  7. Hi Willie
    thanks for sharing the story! My first Morse decoder attempt was with RCA CDP1802 based single board computer kit Telmac 1800 (http://www.hobbylabs.org/telmac.htm) using machine language. It had massive amount of RAM - 2 kBytes. Eventually I built also a teletype interface to have a printer. My recollection is that Morse decoding accuracy of the program was decent with hand keyed Morse when connecting Morse key to one of the input lines. Unfortunately I didn't save any of my notebooks from those days.

    37+ years of Moore's law has certainly made a big difference in CPU speeds. Latest multicore CPUs and GPUs have very impressive performance. You can do real time signal processing and FFTs using normal CPUs (even ones in smartphones) these days.

    I hope this MLM challenge will bring some new people & ideas to this old problem.

    73
    Mauri AG1LE

    ReplyDelete
  8. I have tried to register for a Kaggle account but I haven’t received a conformation email. Has anyone else had this problem? Is the issue with using a yahoo email account? I have tried to contact Kaggle twice about this but again I have no gotten a response…..

    Thanks in advance.

    John Branthoover – WA3YWU

    ReplyDelete
  9. Hi John
    I used my Gmail account and got confirmation email almost immediately. Have you tried any other email address to register?

    73
    Mauri

    ReplyDelete