On Github acabunoc / pyLadies-june-2014
Why do I work in research with biological data?
(short answer: interesting and important)I work at the Ontario Institute for Cancer Research where I'm the lead developer on the WormBase project. I have a background in Computer Science.
The past decade has seen a dramatic increase of data produced by the biological research community
Sequencing the Human Genome
What sort of data do we have?
Complex disease - thousands of subtypes involving a huge number of different combinations of mutations
Soon, cancer genome sequencing will be common in clinical practice
How will all this data be useful?
multi-national collaboration
Goal: Identify the common patterns of mutation in all major cancer types
10K+ donors
4M+ somatic mutations
Goal: understand what’s going on in the 95% of the cancer genome that isn’t protein-coding
Resources: 2K whole genome tumor/normal pairs from ICGC
Cloud based approach - six cloud compute centres in USA, Europe, Asia (bring code to data)
When Pan-Cancer project is done ~1 year
Recently funded
Long-lived private cloud compute centre, pre-populated with ICGC datasets
Initially two physical data centres (w/ Grossman in Chicago) & OICR Toronto. Connected by high speed link
Biology is at a critical momentwhere researchers need developersto implement best practices and build tools tohandle and analyze all this information
I want to be a part of this
Abigail Cabunoc / @abbycabs
oicr.on.ca/careers