On Github bmpvieira / wurmlab-meeting15a
Bionode.io - Modular and universal bioinformatics
Pipeable UNIX command line tools and JavaScript / Node.js APIs for bioinformatic analysis workflows on the server and browser. #bionode gitter.im/bionode/bionode
npm install -g bionodebionode ncbi download gff bacteria bionode ncbi download sra arthropoda | bionode sra fastq-dumpnpm install -g bionode-ncbibionode-ncbi search assembly formicidae | dat import --json
Need to reimplement the same code on browser and server.
Solution: JavaScript everywhere
Difficulty getting relevant description and datasets from NCBI API using bio* libs
Python example: URL for the Achromyrmex assembly?
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG
import xml.etree.ElementTree as ET from Bio import Entrez Entrez.email = "mail@bmpvieira.com" esearch_handle = Entrez.esearch(db="assembly", term="Achromyrmex") esearch_record = Entrez.read(esearch_handle) for id in esearch_record['IdList']: esummary_handle = Entrez.esummary(db="assembly", id=id) esummary_record = Entrez.read(esummary_handle) documentSummarySet = esummary_record['DocumentSummarySet'] document = documentSummarySet['DocumentSummary'][0] metadata_XML = document['Meta'].encode('utf-8') metadata = ET.fromstring('' + metadata_XML + '') for entry in Metadata[1]: print entry.text
Solution: bionode-ncbi
Difficulty getting relevant description and datasets from NCBI API using bio* libs
Example: URL for the Achromyrmex assembly?
http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000204515.1_Aech_3.9/GCA_000204515.1_Aech_3.9_genomic.fna.gz
JavaScript
var bio = require('bionode') bio.ncbi.urls('assembly', 'Acromyrmex', function(urls) { console.log(urls[0].genomic.fna) })
bio.ncbi.urls('assembly', 'Acromyrmex').on('data', printGenomeURL) function printGenomeURL(urls) { console.log(urls[0].genomic.fna) })
Difficulty getting relevant description and datasets from NCBI API using bio* libs
Example: URL for the Achromyrmex assembly?
http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000204515.1_Aech_3.9/GCA_000204515.1_Aech_3.9_genomic.fna.gz
JavaScript
var ncbi = require('bionode-ncbi') var ndjson = require('ndjson') ncbi.urls('assembly', 'Acromyrmex') .pipe(ndjson.stringify()) .pipe(process.stdout)
BASH
bionode-ncbi urls assembly Acromyrmex | tool-stream extractProperty genomic.fna
Difficulty writing scalable, reproducible and complex bioinformatic pipelines.
Solution: Node.js Streams everywhere
var ncbi = require('bionode-ncbi') var tool = require('tool-stream') var through = require('through2') var fork1 = through.obj() var fork2 = through.obj()
Difficulty writing scalable, reproducible and complex bioinformatic pipelines.
Solution: Node.js Streams everywhere
ncbi .search('sra', 'Solenopsis invicta') .pipe(fork1) .pipe(dat.reads) fork1 .pipe(tool.extractProperty('expxml.Biosample.id')) .pipe(ncbi.search('biosample')) .pipe(dat.samples) fork1 .pipe(tool.extractProperty('uid')) .pipe(ncbi.link('sra', 'pubmed')) .pipe(ncbi.search('pubmed')) .pipe(fork2) .pipe(dat.papers)
Difficulty writing scalable, reproducible and complex bioinformatic pipelines.
bionode-ncbi search genome Guillardia theta | tool-stream extractProperty assemblyid | bionode-ncbi download assembly | tool-stream collectMatch status completed | tool-stream extractProperty uid| bionode-ncbi link assembly bioproject | tool-stream extractProperty destUID | bionode-ncbi link bioproject sra | tool-stream extractProperty destUID | bionode-ncbi download sra | bionode-sra fastq-dump | tool-stream extractProperty destFile | bionode-bwa mem 503988/GCA_000315625.1_Guith1_genomic.fna.gz | tool-stream collectMatch status finished| tool-stream extractProperty sam| bionode-sam
Difficulty writing scalable, reproducible and complex bioinformatic pipelines.
Difficulty writing scalable, reproducible and complex bioinformatic pipelines.
{ "import-data": [ "bionode-ncbi search genome eukaryota", "dat import --json --primary=uid" ], "search-ncbi": [ "dat cat", "grep Guillardia", "tool-stream extractProperty assemblyid", "bionode-ncbi download assembly -", "tool-stream collectMatch status completed", "tool-stream extractProperty uid", "bionode-ncbi link assembly bioproject -", "tool-stream extractProperty destUID", "bionode-ncbi link bioproject sra -", "tool-stream extractProperty destUID", "grep 35526", "bionode-ncbi download sra -", "tool-stream collectMatch status completed", "tee > metadata.json" ],
Difficulty writing scalable, reproducible and complex bioinformatic pipelines.
"index-and-align": [ "cat metadata.json", "bionode-sra fastq-dump -", "tool-stream extractProperty destFile", "bionode-bwa mem **/*fna.gz" ], "convert-to-bam": [ "bionode-sam 35526/SRR070675.sam" ] }
Difficulty writing scalable, reproducible and complex bioinformatic pipelines.
pipeline main run pipeline import
pipeline import run foobar | run dat import --json
Same code client/server side