Why Biology Needs More Developers



Why Biology Needs More Developers

0 0


pyLadies-june-2014

My talk for PyLadies

On Github acabunoc / pyLadies-june-2014

Why Biology Needs More Developers

Abigail Cabunoc / @abbycabs

Why do I work in research with biological data?

(short answer: interesting and important)

Hi, I'm Abby!

I work at the Ontario Institute for Cancer Research where I'm the lead developer on the WormBase project. I have a background in Computer Science.

Interesting Problem Part 1

The past decade has seen a dramatic increase of data produced by the biological research community

Sequencing the Human Genome

Sequencing The Human Genome

Apr 2003: 13 years, $3,000,000,000 Jan 2014: 3 days, $1000

Moore's Law

Interesting Problem Part 2

What sort of data do we have?

WormBase

2001: 1 Species (C. elegans) 2014: >25 Species

Cancer Genomes

Complex disease - thousands of subtypes involving a huge number of different combinations of mutations

Soon, cancer genome sequencing will be common in clinical practice

Sample Experimental design

take normal (blood) and tumour (biopsy) samples from a series of donors sequence identify cancer relation mutations translate this knowledge -> improved diagnosis & treatment

Interesting Problem Part 3

How will all this data be useful?

WormBase

www.wormbase.org

International Cancer Genome Consortium (ICGC)

multi-national collaboration

Goal: Identify the common patterns of mutation in all major cancer types

The data is available to the public dcc.icgc.org

10K+ donors

4M+ somatic mutations

Problems

Sharing this data
  • Too big for disk
  • Privacy concerns
Standardization - research silos

Pan-Cancer Whole Genome Analysis Project

Goal: understand what’s going on in the 95% of the cancer genome that isn’t protein-coding

Resources: 2K whole genome tumor/normal pairs from ICGC

Cloud based approach - six cloud compute centres in USA, Europe, Asia (bring code to data)

The Cancer Genome Collaboratory

When Pan-Cancer project is done ~1 year

Recently funded

Long-lived private cloud compute centre, pre-populated with ICGC datasets

Initially two physical data centres (w/ Grossman in Chicago) & OICR Toronto. Connected by high speed link

Biology is at a critical momentwhere researchers need developersto implement best practices and build tools tohandle and analyze all this information

I want to be a part of this

Thank You!

Abigail Cabunoc / @abbycabs

oicr.on.ca/careers