Bionode demo for a wurmlab meeting

Bionode - Modular and universal bioinformatics

Pipeable UNIX command line tools and JavaScript / Node.js APIs for bioinformatic analysis workflows on the server and browser. #bionode

Problem: Too much data

Reproducibility crisis

Reproducibility layers



Bionode also collaborates with BioJS

Bionode - list of modules

Name Type Status People ncbi Data access fasta Parser seq Wrangling IM ensembl Data access blast-parser Parser

Name Type Status People template Documentation JS pipeline Documentation Gasket pipeline Documentation Dat/Bionode workshop Documentation

Name Type Status People sra Wrappers bwa Wrappers sam Wrappers bbi Parser

Name Type People ebi Data access semantic Data access vcf Parser gff Parser bowtie Wrappers sge Wrappers badryan blast Wrappers

Name Type People vsearch Wrappers khmer Wrappers rsem Wrappers gmap Wrappers star Wrappers go Wrappers badryan

Dat workshop


npm install -g bionodebionode ncbi download gff bacteria bionode ncbi download sra arthropoda | bionode sra fastq-dumpnpm install -g bionode-ncbibionode-ncbi search assembly formicidae | dat import --json

Some problems I faced during my research:

  • For web projects, needed to implement the same functionality on browser and server
  • Difficulty getting relevant descriptions and datasets from NCBI API using bio* libs
  • Difficulty writing scalable, reproducible and complex bioinformatic pipelines

Solution: JavaScript everywhere

Difficulty getting relevant description and datasets from NCBI API using bio* libs

Python example: URL for the Achromyrmex assembly?
import xml.etree.ElementTree as ET
from Bio import Entrez = ""
esearch_handle = Entrez.esearch(db="assembly", term="Achromyrmex")
esearch_record =
for id in esearch_record['IdList']:
  esummary_handle = Entrez.esummary(db="assembly", id=id)
  esummary_record =
  documentSummarySet = esummary_record['DocumentSummarySet']
  document = documentSummarySet['DocumentSummary'][0]
  metadata_XML = document['Meta'].encode('utf-8')
  metadata = ET.fromstring('' + metadata_XML + '')
  for entry in Metadata[1]:
    print entry.text

Solution: bionode-ncbi

var bio = require('bionode')
bio.ncbi.urls('assembly', 'Acromyrmex', function(urls) {
bio.ncbi.urls('assembly', 'Acromyrmex').on('data', printGenomeURL)
function printGenomeURL(urls) {

var ncbi = require('bionode-ncbi')
var ndjson = require('ndjson')
ncbi.urls('assembly', 'Acromyrmex')


bionode-ncbi urls assembly Acromyrmex |
tool-stream extractProperty genomic.fna

Difficulty writing scalable, reproducible and complex bioinformatic pipelines.

Solution: Node.js Streams everywhere

var ncbi = require('bionode-ncbi')
var tool = require('tool-stream')
var through = require('through2')
var fork1 = through.obj()
var fork2 = through.obj()

Solution: Node.js Streams everywhere

.search('sra', 'Solenopsis invicta')


.pipe('sra', 'pubmed'))

bionode-ncbi search genome Guillardia theta |
tool-stream extractProperty assemblyid |
bionode-ncbi download assembly |
tool-stream collectMatch status completed |
tool-stream extractProperty uid|
bionode-ncbi link assembly bioproject |
tool-stream extractProperty destUID |
bionode-ncbi link bioproject sra |
tool-stream extractProperty destUID |
bionode-ncbi download sra |
bionode-sra fastq-dump |
tool-stream extractProperty destFile |
bionode-bwa mem 503988/GCA_000315625.1_Guith1_genomic.fna.gz |
tool-stream collectMatch status finished|
tool-stream extractProperty sam|

   "import-data": [ 
     "bionode-ncbi search genome eukaryota", 
     "dat import --json --primary=uid" 
   "search-ncbi": [ 
     "dat cat", 
     "grep Guillardia", 
     "tool-stream extractProperty assemblyid", 
     "bionode-ncbi download assembly -", 
     "tool-stream collectMatch status completed", 
     "tool-stream extractProperty uid", 
     "bionode-ncbi link assembly bioproject -", 
     "tool-stream extractProperty destUID", 
     "bionode-ncbi link bioproject sra -", 
     "tool-stream extractProperty destUID", 
     "grep 35526", 
     "bionode-ncbi download sra -", 
     "tool-stream collectMatch status completed", 
     "tee > metadata.json" 

   "index-and-align": [ 
     "cat metadata.json", 
     "bionode-sra fastq-dump -", 
     "tool-stream extractProperty destFile", 
     "bionode-bwa mem **/*fna.gz" 
   "convert-to-bam": [ 
     "bionode-sam 35526/SRR070675.sam" 

pipeline main
run pipeline import

pipeline import run foobar | run dat import --json

Extra slides

Bionode - Why wrappers?

  • Same interface between modules (Streams and NDJSON)
  • Easy installation with NPM
  • Semantic versioning
  • Add tests
  • Abstract complexity / More user friendly

Bionode - Why Node.js?

Same code client/server side

Bionode - Why Node.js?

Reusable, small and tested modules

Benefit from other JS projects
