So tomorrow is the big day. Tomorrow is the day that I present my research at the 40th Annual Meeting of the Statistical Society of Canada1. W00t! For those of you who may have forgotten (or aren’t in the know), my presentation is titled Signal Processing for Species Identification. The general idea: develop mathemagical and statistical methods to take a mass of biological goo – which could contain DNA fragments from multiple species – and determine which species and how much of said species are present.
To do this, the biological goo will be fed into a machine that goes Bing!2 The machine will convert the input-goo into a bunch of DNA chains, the chains being a sequence of bases identified as A, C, T, and G. I’ll forgo the scientific bio-geekery required to describe how this input-goo-to-A-C-T-G chains occurs, because the mathemagic I’m interested in occurs after this stage of the game.
What is the mathemagical part?
Well, imagine you’ve picked up a set of books for a friend from the library – but you are so busy you don’t pay attention to what you’ve picked up. Imagine also that the library has all of the books they own listed on their website. Imagine further that you accidentally stumble with this set of books and drop them into a shredder. All that remains are a few shreds of paper that have sentence fragments written on them. The problem – can you identify the books that were in the box based on the sentence fragments (knowing only the full list of books in the library), so that you can replace them before anyone is the wiser to your unfortunate mishap?
That’s what I’m trying to do. Except in my case the mass of shredded sentence fragments would be the biological goo, and the books would represent the species. The library – that’s simply a database of all possible species.
But, that’s not all. To make things even more complicated, the machine that goes Bing! isn’t perfect – sometimes it changes up the letters in the sentence fragment (replacing them with something else), or it cuts chunks out of the sentence fragment. So, imagine that the shredder also rewrites some of the sentence fragments, or chops a fragment into pieces and rejoins it. Can we still identify the books?
Anyway, because I’ve been running around crazy this week, I have yet to finish my presentation. Of course, I probably could have worked on that instead of writing this blog post, but that’s just crazy talk.
For now, we’re off to the banquet. Because presentation building requires food. And perhaps a very statistical beer or two.
1 It’s also National Running Day, which means I’m going to go for a run. And after all of the crazy running around the last few days, plus session after session of statistical nerdery, I need to start pounding the pavement to a) process what I’ve seen, and b) decompress.
2 All good science requires a machine that goes Bing.
- UGA scientists map and sequence genome of switchgrass relative foxtail millet (eurekalert.org)
- That’s it full stop … (editorspen.wordpress.com)
- A Grumpy Note on Statistics (quantumdiaries.org)
- The SSC Comes To Town (danielgillis.wordpress.com)
- One More Sleep (danielgillis.wordpress.com)
- Let The Nerding Begin (consumedbywanderlust.wordpress.com)
- Stupid Head (consumedbywanderlust.wordpress.com)