What we talk about when we talk about networks
A grammar of network methods in biology
This is a reveal.js presentation.
Press Space to advance the slides, or use the arrow keys or the mouse. Press O to zoom out, and Space or Left-Mouse to zoom back in. If you're on a phone, you can probably swipe too.
Topics
- Introduction: the gap
- Solution and sentence examples
- Detailed explanation
- Examples
- Implementation as a DSL
- Research outputs
- Conclusion
Year
Publications
2006
295
2007
325
2008
327
2009
355
2010
367
2011
423
2012
492
2013
642
Nature Group publications containing 'network' by year
- And basic metrics suggest that it is an active area of research
- Medline searches for 'network' show steady increase in number and relative percentage, up to about 2% in 2014
- Searches for 'network' in Nature Group publications from 2006-2014 show steady increase.
- Subjectively we have seen high-profile publications from 2006 onward - Califano, the DREAM consortium, Barabasi, Uri Alon, etc.
'Network' biology can describe different things
'modelling networks' ≠ 'network modelling'
This work is about building graph models to gain biological insight
Different methods for making networks from biological data
Association-based methods: network edges represent association or similarity between variables
Bayesian methods: network edges represent conditional probabilities
Regression-based methods (e.g. MIKANA): network edges represent terms in a system of equations
Key concept: the biological meaning of a network model comes from:
1. The source data2. The operations used to make it
Which is the best method?
- Dialogue on Reverse Engineering Assessment and Methods (DREAM) attempts to benchmark and evaluate methods competively
- But different methods create networks representing quite different things
- Choose approaches based on a detailed understanding of the data and possible hypotheses
- Evaluating differences between approaches is very difficult.
- Current methods are designed and applied ad hoc and each uses their own idiosyncratic language
- The meaning of a network and its elements depends on the generating data, and on the fine detail of each step in its construction
- Each problem requires different applications of different methods
The problem of disparate approaches with different language has persisted for the last 6+ years
“..performance is related more to the details of implementation than the choice of under-lying methodology”
Marbach et al. (2012). Wisdom of crowds for robust gene network inference. Nat Meth
..but we have no language to describe the 'details of implementation'!
The consequences
- Lots of methods, but few principles
- We can make complex mathematical objects
- But we cannot reason from the details of their construction, to their meaning
- We know more data will improve things..
- But we can't say what 'improvement' means in biological terms
Progress since 2006 has been slower than expected
Not a network
(Magritte picture here)
Not a network
- In this description, a 'network' is a 'graph model' built from biological data
- We are interested in the relationship between graph properties and biological meaning
Uncovering meaning
How do we uncover meaning in language?
¿Donde esta el zapateria?
But then
I began to understand the grammar
¿Donde esta el zapateria?
Where is the shoestore?
Concept of a grammar
- A set of principles that govern the construction and interpretation of phrases
- Meaning (semantics) arises from context (situation) and syntax (rules that govern the structure of sentences)
- Two ways to get to semantics from context and syntax: formal, and empirical
So
To get to biological meaning from networks, we need a grammar of graph methods
A grammar of graph methods
Find common features in all network methods
Work out a language to describe them
- Must be general enough to be comprehensive (able to describe all analyses of this type
- But specific enough to be useful (able to provide meaningful insight)
The focus
Three common steps:
1.Data -> Graph
2.Graph -> Graph
3.Graph (element) -> Hypothesis
Influenced by previous work
- Wilkinson, L. (2010). The grammar of graphics.
- Wickham, H. (2008). ggplot2: an Implementation of the Grammar of Graphics.
- Wickham, H. dplyr: A Grammar of Data Manipulation
- Declarative SQL-type syntax - this is a specification, so it must describe what things are
-
Cypher, the graph language for Neo4j:
-
Gremlin, a graph traversal language
But quite different
1.Focus on process, not data visualisation
2.Not a data standard (e.g. 'minimum information')
3.Not an ontology/controlled vocabulary - represents process, not knowledge. Syntax carries meaning
4.Divisions, syntax and vocabulary are completely new
Empirical grammar, not formal grammar
(translation: constructed by observation and experiment)
A grammar of graph methods
-
Elements (nouns)
-
Operators (verbs)
'Parts of speech'
1.Data -> Graph
2.Graph -> Graph
3.Graph (element) -> Hypothesis
'Sentences' represent analyses
Expressing analyses using a structured grammar lets us:
1.Recognise difference and similarity between approaches
2.Link biological conclusions to the details of an analysis
-> no more black boxes
Data -> Graph
DATATRANS
Put data in 'long' format - less intuitive for non-statisticians, but allows us to be really precise about preprocessing and data manipulation before building a network. A very good example in R is dplyr.
What do we need to make a graph?
1.Index variable
2.Generative element
3.Operation
DATATRANS: Index variable
The index variable is the variable that differentiates nodes in the network
It partitions the dataset
Examples: probeID/protein ID for transcriptomic/proteomic
networks, or metabolite ID for metabolic networks
Typically not unique in 'long' format data (multiple observations)
Always unique in the network (because each node is different)
DATATRANS: Generative element
The generative element is the smallest part of the graph required to build the network
Get the partitioned data associated with each generative element
New result: methods with the same generative element respond in a similar way to changes in the data
DATATRANS: Generative element
What is the generative element for an association network (e.g. ARACNE/CLR/WCGNA)?
Answer: the edge
DATATRANS: Generative element
What is the generative element for a dynamical systems network?
Answer: the subgraph
Different ge = different response to changes in data
DATATRANS: Operation
- The operation is performed on the data associated with the generative element
- It returns a property (scalar or vector) which gets assigned to the generative element
Graph -> Graph
GRAPHTRANS
- Often several steps in an analysis
- Typically feature one or more select() and group_by operations
-
summarise() allows us to choose from a graph set
- Literature networks start at this point
Graph -> Hypothesis
ASSERT
- Assertions involve some properties of a graph element, and some other data
- We often use select() %.% filter() to get the element
- Often downstream computation outside the grammar
Putting it all together
DATATRANS{index('probeID')
ge('edge'),
operation('mi','value'))}
GRAPHTRANS{select ('edges') %.%
filter %.%
select ('triplets') %.%
filter}
ASSERT{select ('edges') %.%
filter}
Califano, Iavarone et al. Nature (2010)
EMT in glioma
ge('edge'), operation('mi')
select ('edges') %.%, filter
select ('triplets') %.%, filter
group_by(graph set,'edge') %.%, summarise('count')
select('edges') %.%, filter
Carro, M. S., Lim, W. K., Alvarez, M. J., Bollo, R. J., Zhao, X., Snyder, E. Y., Iavarone, A. (2010). The transcriptional network for mesenchymal transformation of brain tumours. Nature, 463(7279), 318-25. doi:10.1038/nature08712
Ideker et al. Science (2010)
Differential network analysis
ge('edge'), operation('interaction_score > +/-2.0')
group_by(graph set,'edge') %.%
summarise('count')
select('all edges') %.%
filter
Bandyopadhyay, S., Mehta, M., Kuo, D., Sung, M.-K., Chuang, R., Jaehnig, E. J., Ideker, T. (2010). Rewiring of genetic networks in response to DNA damage. Science (New York, N.Y.), 330, 1385-1389. doi:10.1126/science.1195618
Alon et al. Nature Genetics (2002)
Persistent network motifs
select('motif') %.%
filter
group_by(graph set,'edge') %.%
summarise('count')
select('edges.count') %.%
filter
Shen-Orr, S. S., Milo, R., Mangan, S., & Alon, U. (2002). Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics, 31(1), 64-68. doi:10.1038/ng881
Recap of results
- The grammar unifies work which appears different
- Complete and explicit specification of methods
- Concept of the 'generative element' makes clear what data is responsible for what elements
- New result: different generative element = different response to data changes
- The grammar allows us to work backward, as well as forward, through an analysis
- Key point: understanding syntax helps us get to meaning in a principled way
Implementation
- This is a practical tool as well as a conceptual framework
- We have embedded the grammar in a domain-specific language
- Prototype implementation running in R, extend into Python, based on the igraph library
- Construct analyses directly using the grammar
Work in progress:
- Inheritance of properties
- Iterators for common tasks (shortcuts)
- Store the generative history of elements
- Caching layer
- Forward chaining (infix) operators (= pipes)
Finally
- The grammar fills a specific gap in network biology
- Conventional graph theory doesn't: it starts with a graph, and focuses mostly on traversals
- Computational biology generally doesn't: it treats each method as a separate entity
- Draw results directly relevant to experimentation
Acknowledgements
Systems Biology LabUniversity of MelbourneProf Edmund CrampinDr Melissa DavisDr Joe CursonsDavid BuddenMichael Pan
Regulators of melanomaproliferation projectProf Cristin PrintDr Anita MuthukaruppanDr Li WangDr Sunali Mehta
Funding sourcesUniversity of Auckland, Faculty ofMedical and Health SciencesMaurice Wilkins CentreNew Zealand Genomics LtdNZ Health Research CouncilNZ Marsden FundAuckland District Health Board
Drug mechanism of action projectProf Bill WilsonDr Frederik PruijnDr Moana TercelFrancis Hunter
Structural insight into endothelialcell regulation projectProf Cristin PrintProf Satoru MiyanoDr Stephen Charnock-JonesDr Muna AffaraDr Yoshinori TamadaDr Hiromitsu ArakiSally Humphreys
1
What we talk about when we talk about networks
A grammar of network methods in biology
This is a reveal.js presentation.
Press Space to advance the slides, or use the arrow keys or the mouse. Press O to zoom out, and Space or Left-Mouse to zoom back in. If you're on a phone, you can probably swipe too.