What we talk about  when we talk about networks



What we talk about  when we talk about networks

0 0


grammarofnetworks


On Github danielghurley / grammarofnetworks

What we talk about  when we talk about networks

A grammar of network methods in biology

This is a reveal.js presentation. Press Space to advance the slides, or use the arrow keys or the mouse. Press O to zoom out, and Space or Left-Mouse to zoom back in. If you're on a phone, you can probably swipe too.

Topics

  • Introduction: the gap
  • Solution and sentence examples
  • Detailed explanation
  • Examples
  • Implementation as a DSL
  • Research outputs
  • Conclusion

Introduction

Networks and network concepts are very common in cellular and molecular biology

“Networks have long been instrumental in representing and visualizing biological relationships. The challenge now is to transform these networks into representations that capture the multiscale modularity inherent in all biological systems.”

Ideker et al. (2013). A gene ontology inferred from molecular networks. Nature Biotechnology

“..network-based methodologies may be increasingly valuable entries...for the identification and mechanistic elucidation of genetic determinants of physiological and disease-related phenotypes..”

Califano et. al (2014). Identification of Causal Genetic Drivers of Human Disease throughSystems-Level Analysis of Regulatory Networks. Cell

Year Publications 2006 295 2007 325 2008 327 2009 355 2010 367 2011 423 2012 492 2013 642

Nature Group publications containing 'network' by year

  • And basic metrics suggest that it is an active area of research
  • Medline searches for 'network' show steady increase in number and relative percentage, up to about 2% in 2014
  • Searches for 'network' in Nature Group publications from 2006-2014 show steady increase.
  • Subjectively we have seen high-profile publications from 2006 onward - Califano, the DREAM consortium, Barabasi, Uri Alon, etc.

'Network' biology can describe different things

'modelling networks' ≠ 'network modelling'

This work is about building graph models to gain biological insight

Different methods for making networks from biological data

Association-based methods: network edges represent association or similarity between variables

Bayesian methods: network edges represent conditional probabilities

Regression-based methods (e.g. MIKANA): network edges represent terms in a system of equations

Key concept: the biological meaning of a network model comes from:

1. The source data2. The operations used to make it

Which is the best method?

  • Dialogue on Reverse Engineering Assessment and Methods (DREAM) attempts to benchmark and evaluate methods competively
  • But different methods create networks representing quite different things
  • Choose approaches based on a detailed understanding of the data and possible hypotheses
  • Evaluating differences between approaches is very difficult.
  • Current methods are designed and applied ad hoc and each uses their own idiosyncratic language
  • The meaning of a network and its elements depends on the generating data, and on the fine detail of each step in its construction
  • Each problem requires different applications of different methods

The problem of disparate approaches with different language has persisted for the last 6+ years

“..performance is related more to the details of implementation than the choice of under-lying methodology” Marbach et al. (2012). Wisdom of crowds for robust gene network inference. Nat Meth

..but we have no language to describe the 'details of implementation'!

The consequences

  • Lots of methods, but few principles
  • We can make complex mathematical objects
  • But we cannot reason from the details of their construction, to their meaning
  • We know more data will improve things..
  • But we can't say what 'improvement' means in biological terms

Progress since 2006 has been slower than expected

Not a network

(Magritte picture here)

Not a network

  • In this description, a 'network' is a 'graph model' built from biological data
  • We are interested in the relationship between graph properties and biological meaning

Uncovering meaning

How do we uncover meaning in language?

¿Donde esta el zapateria?

But then

I began to understand the grammar

¿Donde esta el zapateria?

Where is   the shoestore?

Concept of a grammar

  • A set of principles that govern the construction and interpretation of phrases
  • Meaning (semantics) arises from context (situation) and syntax (rules that govern the structure of sentences)
  • Two ways to get to semantics from context and syntax: formal, and empirical

So

To get to biological meaning from networks, we need a grammar of graph methods

A grammar of graph methods

Find common features in all network methods

Work out a language to describe them

  • Must be general enough to be comprehensive (able to describe all analyses of this type
  • But specific enough to be useful (able to provide meaningful insight)

The focus

Three common steps:

1.Data -> Graph

2.Graph -> Graph

3.Graph (element) -> Hypothesis

Influenced by previous work

  • Wilkinson, L. (2010). The grammar of graphics.
  • Wickham, H. (2008). ggplot2: an Implementation of the Grammar of Graphics.
  • Wickham, H. dplyr: A Grammar of Data Manipulation
  • Declarative SQL-type syntax - this is a specification, so it must describe what things are
  • Cypher, the graph language for Neo4j:
  • Gremlin, a graph traversal language

But quite different

1.Focus on process, not data visualisation

2.Not a data standard (e.g. 'minimum information')

3.Not an ontology/controlled vocabulary - represents process, not knowledge. Syntax carries meaning

4.Divisions, syntax and vocabulary are completely new

Empirical grammar, not formal grammar

(translation: constructed by observation and experiment)

A grammar of graph methods

  • Elements (nouns)

  • Operators (verbs)

'Parts of speech'

1.Data -> Graph

2.Graph -> Graph

3.Graph (element) -> Hypothesis

'Sentences' represent analyses

Expressing analyses using a structured grammar lets us:

1.Recognise difference and similarity between approaches

2.Link biological conclusions to the details of an analysis

-> no more black boxes

Data -> Graph

DATATRANS

Put data in 'long' format - less intuitive for non-statisticians, but allows us to be really precise about preprocessing and data manipulation before building a network. A very good example in R is dplyr.

What do we need to make a graph?

1.Index variable

2.Generative element

3.Operation

DATATRANS: Index variable

The index variable is the variable that differentiates nodes in the network

It partitions the dataset

Examples: probeID/protein ID for transcriptomic/proteomic networks, or metabolite ID for metabolic networks

Typically not unique in 'long' format data (multiple observations) Always unique in the network (because each node is different)

DATATRANS: Generative element

The generative element is the smallest part of the graph required to build the network

Get the partitioned data associated with each generative element

New result: methods with the same generative element respond in a similar way to changes in the data

DATATRANS: Generative element

What is the generative element for an association network (e.g. ARACNE/CLR/WCGNA)?

Answer: the edge

DATATRANS: Generative element

What is the generative element for a dynamical systems network?

Answer: the subgraph

Different ge = different response to changes in data

DATATRANS: Operation

  • The operation is performed on the data associated with the generative element
  • It returns a property (scalar or vector) which gets assigned to the generative element

Graph -> Graph

GRAPHTRANS

  • Often several steps in an analysis
  • Typically feature one or more select() and group_by operations
  • summarise() allows us to choose from a graph set
  • Literature networks start at this point

Graph -> Hypothesis

ASSERT

  • Assertions involve some properties of a graph element, and some other data
  • We often use select() %.% filter() to get the element
  • Often downstream computation outside the grammar

Putting it all together

DATATRANS{index('probeID')
ge('edge'),
operation('mi','value'))}
GRAPHTRANS{select ('edges') %.%
filter %.%
select ('triplets') %.%
filter}
ASSERT{select ('edges') %.%
filter}

Some examples

Califano, Iavarone et al. Nature (2010) EMT in glioma

ge('edge'), operation('mi')
select ('edges') %.%, filter
select ('triplets') %.%, filter
group_by(graph set,'edge') %.%, summarise('count')
select('edges') %.%, filter

Carro, M. S., Lim, W. K., Alvarez, M. J., Bollo, R. J., Zhao, X., Snyder, E. Y., Iavarone, A. (2010). The transcriptional network for mesenchymal transformation of brain tumours. Nature, 463(7279), 318-25. doi:10.1038/nature08712

Ideker et al. Science (2010) Differential network analysis

ge('edge'), operation('interaction_score > +/-2.0')
group_by(graph set,'edge') %.%
summarise('count')
select('all edges') %.%
filter

Bandyopadhyay, S., Mehta, M., Kuo, D., Sung, M.-K., Chuang, R., Jaehnig, E. J., Ideker, T. (2010). Rewiring of genetic networks in response to DNA damage. Science (New York, N.Y.), 330, 1385-1389. doi:10.1126/science.1195618

Alon et al. Nature Genetics (2002) Persistent network motifs

select('motif') %.%
filter
group_by(graph set,'edge') %.%
summarise('count')
select('edges.count') %.%
filter

Shen-Orr, S. S., Milo, R., Mangan, S., & Alon, U. (2002). Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics, 31(1), 64-68. doi:10.1038/ng881

Recap of results

  • The grammar unifies work which appears different
  • Complete and explicit specification of methods
  • Concept of the 'generative element' makes clear what data is responsible for what elements
  • New result: different generative element = different response to data changes
  • The grammar allows us to work backward, as well as forward, through an analysis
  • Key point: understanding syntax helps us get to meaning in a principled way

Implementation

  • This is a practical tool as well as a conceptual framework
  • We have embedded the grammar in a domain-specific language
  • Prototype implementation running in R, extend into Python, based on the igraph library
  • Construct analyses directly using the grammar

Work in progress:

  • Inheritance of properties
  • Iterators for common tasks (shortcuts)
  • Store the generative history of elements
  • Caching layer
  • Forward chaining (infix) operators (= pipes)

Finally

  • The grammar fills a specific gap in network biology
  • Conventional graph theory doesn't: it starts with a graph, and focuses mostly on traversals
  • Computational biology generally doesn't: it treats each method as a separate entity
  • Draw results directly relevant to experimentation

Further resources

Systems Biology Lab website

Detailed explanation via ArXiV

Prototype R package

Back to start

Acknowledgements

Systems Biology LabUniversity of MelbourneProf Edmund CrampinDr Melissa DavisDr Joe CursonsDavid BuddenMichael Pan Regulators of melanomaproliferation projectProf Cristin PrintDr Anita MuthukaruppanDr Li WangDr Sunali Mehta Funding sourcesUniversity of Auckland, Faculty ofMedical and Health SciencesMaurice Wilkins CentreNew Zealand Genomics LtdNZ Health Research CouncilNZ Marsden FundAuckland District Health Board Drug mechanism of action projectProf Bill WilsonDr Frederik PruijnDr Moana TercelFrancis Hunter Structural insight into endothelialcell regulation projectProf Cristin PrintProf Satoru MiyanoDr Stephen Charnock-JonesDr Muna AffaraDr Yoshinori TamadaDr Hiromitsu ArakiSally Humphreys
1
What we talk about  when we talk about networks A grammar of network methods in biology This is a reveal.js presentation. Press Space to advance the slides, or use the arrow keys or the mouse. Press O to zoom out, and Space or Left-Mouse to zoom back in. If you're on a phone, you can probably swipe too.