grammarofnetworks

What we talk about when we talk about networks

A grammar of network methods in biology

This is a reveal.js presentation. Press Space to advance the slides, or use the arrow keys or the mouse. Press O to zoom out, and Space or Left-Mouse to zoom back in. If you're on a phone, you can probably swipe too.

Topics

Introduction: the gap
Solution and sentence examples
Detailed explanation
Examples
Implementation as a DSL
Research outputs
Conclusion

Introduction

Networks and network concepts are very common in cellular and molecular biology

“Networks have long been instrumental in representing and visualizing biological relationships. The challenge now is to transform these networks into representations that capture the multiscale modularity inherent in all biological systems.”

Ideker et al. (2013). A gene ontology inferred from molecular networks. Nature Biotechnology

“..network-based methodologies may be increasingly valuable entries...for the identification and mechanistic elucidation of genetic determinants of physiological and disease-related phenotypes..”

Califano et. al (2014). Identification of Causal Genetic Drivers of Human Disease throughSystems-Level Analysis of Regulatory Networks. Cell

Year Publications 2006 295 2007 325 2008 327 2009 355 2010 367 2011 423 2012 492 2013 642

Nature Group publications containing 'network' by year

And basic metrics suggest that it is an active area of research
Medline searches for 'network' show steady increase in number and relative percentage, up to about 2% in 2014
Searches for 'network' in Nature Group publications from 2006-2014 show steady increase.
Subjectively we have seen high-profile publications from 2006 onward - Califano, the DREAM consortium, Barabasi, Uri Alon, etc.

'Network' biology can describe different things

Mechanistic modelling of known processes (e.g. cell-cycle, glycogenolysis)
Experimentation on known processes (e.g. synthetic biology)
Analysing interactions from literature (e.g. IPA, Reactome, STRING)
Making 'graph' models from biological data

'modelling networks' ≠ 'network modelling'

This work is about building graph models to gain biological insight

Different methods for making networks from biological data

Association-based methods: network edges represent association or similarity between variables

Bayesian methods: network edges represent conditional probabilities

Regression-based methods (e.g. MIKANA): network edges represent terms in a system of equations

Key concept: the biological meaning of a network model comes from:

1. The source data2. The operations used to make it

Which is the best method?

Dialogue on Reverse Engineering Assessment and Methods (DREAM) attempts to benchmark and evaluate methods competively
But different methods create networks representing quite different things
Choose approaches based on a detailed understanding of the data and possible hypotheses

Evaluating differences between approaches is very difficult.
Current methods are designed and applied ad hoc and each uses their own idiosyncratic language
The meaning of a network and its elements depends on the generating data, and on the fine detail of each step in its construction
Each problem requires different applications of different methods

The problem of disparate approaches with different language has persisted for the last 6+ years

“..performance is related more to the details of implementation than the choice of under-lying methodology” Marbach et al. (2012). Wisdom of crowds for robust gene network inference. Nat Meth

..but we have no language to describe the 'details of implementation'!

The consequences

Lots of methods, but few principles
We can make complex mathematical objects
But we cannot reason from the details of their construction, to their meaning
We know more data will improve things..
But we can't say what 'improvement' means in biological terms

Progress since 2006 has been slower than expected

Not a network

(Magritte picture here)

Not a network

In this description, a 'network' is a 'graph model' built from biological data
We are interested in the relationship between graph properties and biological meaning

Uncovering meaning

How do we uncover meaning in language?

¿Donde esta el zapateria?

But then

I began to understand the grammar

¿Donde esta el zapateria?

Where is the shoestore?

Concept of a grammar

A set of principles that govern the construction and interpretation of phrases
Meaning (semantics) arises from context (situation) and syntax (rules that govern the structure of sentences)
Two ways to get to semantics from context and syntax: formal, and empirical

So

To get to biological meaning from networks, we need a grammar of graph methods

A grammar of graph methods

Find common features in all network methods

Work out a language to describe them

Must be general enough to be comprehensive (able to describe all analyses of this type
But specific enough to be useful (able to provide meaningful insight)

The focus

Three common steps:

1.Data -> Graph

2.Graph -> Graph

3.Graph (element) -> Hypothesis

Influenced by previous work

Wilkinson, L. (2010). The grammar of graphics.
Wickham, H. (2008). ggplot2: an Implementation of the Grammar of Graphics.
Wickham, H. dplyr: A Grammar of Data Manipulation
Declarative SQL-type syntax - this is a specification, so it must describe what things are
Cypher, the graph language for Neo4j:
Gremlin, a graph traversal language

But quite different

1.Focus on process, not data visualisation

2.Not a data standard (e.g. 'minimum information')

3.Not an ontology/controlled vocabulary - represents process, not knowledge. Syntax carries meaning

4.Divisions, syntax and vocabulary are completely new

Empirical grammar, not formal grammar

(translation: constructed by observation and experiment)

A grammar of graph methods

Elements (nouns)
Operators (verbs)

'Parts of speech'

1.Data -> Graph

2.Graph -> Graph

3.Graph (element) -> Hypothesis

'Sentences' represent analyses

Expressing analyses using a structured grammar lets us:

1.Recognise difference and similarity between approaches

2.Link biological conclusions to the details of an analysis

-> no more black boxes

Data -> Graph

DATATRANS

Put data in 'long' format - less intuitive for non-statisticians, but allows us to be really precise about preprocessing and data manipulation before building a network. A very good example in R is dplyr.

What do we need to make a graph?

1.Index variable

2.Generative element

3.Operation

DATATRANS: Index variable

The index variable is the variable that differentiates nodes in the network

It partitions the dataset

Examples: probeID/protein ID for transcriptomic/proteomic networks, or metabolite ID for metabolic networks

Typically not unique in 'long' format data (multiple observations) Always unique in the network (because each node is different)

DATATRANS: Generative element

The generative element is the smallest part of the graph required to build the network

Get the partitioned data associated with each generative element

New result: methods with the same generative element respond in a similar way to changes in the data

DATATRANS: Generative element

What is the generative element for an association network (e.g. ARACNE/CLR/WCGNA)?

Answer: the edge

DATATRANS: Generative element

What is the generative element for a dynamical systems network?

Answer: the subgraph

Different ge = different response to changes in data

DATATRANS: Operation

The operation is performed on the data associated with the generative element
It returns a property (scalar or vector) which gets assigned to the generative element

Graph -> Graph

GRAPHTRANS

Often several steps in an analysis
Typically feature one or more select() and group_by operations
summarise() allows us to choose from a graph set
Literature networks start at this point

Graph -> Hypothesis

ASSERT

Assertions involve some properties of a graph element, and some other data
We often use select() %.% filter() to get the element
Often downstream computation outside the grammar

Putting it all together

DATATRANS{index('probeID')
ge('edge'),
operation('mi','value'))}
GRAPHTRANS{select ('edges') %.%
filter %.%
select ('triplets') %.%
filter}
ASSERT{select ('edges') %.%
filter}

Some examples

Califano, Iavarone et al. Nature (2010) EMT in glioma

ge('edge'), operation('mi')
select ('edges') %.%, filter
select ('triplets') %.%, filter
group_by(graph set,'edge') %.%, summarise('count')
select('edges') %.%, filter

Carro, M. S., Lim, W. K., Alvarez, M. J., Bollo, R. J., Zhao, X., Snyder, E. Y., Iavarone, A. (2010). The transcriptional network for mesenchymal transformation of brain tumours. Nature, 463(7279), 318-25. doi:10.1038/nature08712

Ideker et al. Science (2010) Differential network analysis

ge('edge'), operation('interaction_score > +/-2.0')
group_by(graph set,'edge') %.%
summarise('count')
select('all edges') %.%
filter

Bandyopadhyay, S., Mehta, M., Kuo, D., Sung, M.-K., Chuang, R., Jaehnig, E. J., Ideker, T. (2010). Rewiring of genetic networks in response to DNA damage. Science (New York, N.Y.), 330, 1385-1389. doi:10.1126/science.1195618

Alon et al. Nature Genetics (2002) Persistent network motifs

select('motif') %.%
filter
group_by(graph set,'edge') %.%
summarise('count')
select('edges.count') %.%
filter

Shen-Orr, S. S., Milo, R., Mangan, S., & Alon, U. (2002). Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics, 31(1), 64-68. doi:10.1038/ng881

Recap of results

The grammar unifies work which appears different
Complete and explicit specification of methods
Concept of the 'generative element' makes clear what data is responsible for what elements
New result: different generative element = different response to data changes
The grammar allows us to work backward, as well as forward, through an analysis
Key point: understanding syntax helps us get to meaning in a principled way

Implementation

This is a practical tool as well as a conceptual framework
We have embedded the grammar in a domain-specific language
Prototype implementation running in R, extend into Python, based on the igraph library
Construct analyses directly using the grammar

Work in progress:

Inheritance of properties
Iterators for common tasks (shortcuts)
Store the generative history of elements
Caching layer
Forward chaining (infix) operators (= pipes)

Finally

The grammar fills a specific gap in network biology
Conventional graph theory doesn't: it starts with a graph, and focuses mostly on traversals
Computational biology generally doesn't: it treats each method as a separate entity
Draw results directly relevant to experimentation

Further resources

Systems Biology Lab website

Detailed explanation via ArXiV

Prototype R package

Back to start

Acknowledgements

Systems Biology LabUniversity of MelbourneProf Edmund CrampinDr Melissa DavisDr Joe CursonsDavid BuddenMichael Pan Regulators of melanomaproliferation projectProf Cristin PrintDr Anita MuthukaruppanDr Li WangDr Sunali Mehta Funding sourcesUniversity of Auckland, Faculty ofMedical and Health SciencesMaurice Wilkins CentreNew Zealand Genomics LtdNZ Health Research CouncilNZ Marsden FundAuckland District Health Board Drug mechanism of action projectProf Bill WilsonDr Frederik PruijnDr Moana TercelFrancis Hunter Structural insight into endothelialcell regulation projectProf Cristin PrintProf Satoru MiyanoDr Stephen Charnock-JonesDr Muna AffaraDr Yoshinori TamadaDr Hiromitsu ArakiSally Humphreys

What we talk about when we talk about networks A grammar of network methods in biology This is a reveal.js presentation. Press Space to advance the slides, or use the arrow keys or the mouse. Press O to zoom out, and Space or Left-Mouse to zoom back in. If you're on a phone, you can probably swipe too.

What we talk about when we talk about networks

danielghurley

What we talk about when we talk about networks

0 0