VERSE: a novel approach to detect virus integration in host genomes through reference genome customization



VERSE: a novel approach to detect virus integration in host genomes through reference genome customization

0 0


2015-04-13-verse-virusfinder-presentation


On Github inodb / 2015-04-13-verse-virusfinder-presentation

VERSE: a novel approach to detect virus integration in host genomes through reference genome customization

Wang et al, Genome Medicine (2015)

Ino de Bruijn / @inodb

VirusFinder 1.0 (2013)

  • detect viruses of any type
  • integrated/unintegrated
  • detect virus integration sites
  • WGS, RNA-Seq, TS (paired end reads)

VirusFinder 1.0 (2013)

VERSE aka VirusFinder 2.0 (2015)

  • Tries to fix
    • host genomic instability
    • high virus mutation rates
  • Host and virus genome customization with iCORN
    • (Otto et al, 2010)
    • Iterative Correction of Reference Nucleotides
  • iCORN detects SNPs and indels
    • adjust references iteratively
    • increase coverage over each base

VERSE (2015)

VERSE (2015)

  • Two classes of predicted integration sites
    • high confidence: sufficient soft-clipped reads for CREST
    • low confidence: locus within region predicted by SVDetect that has most soft-clipped reads
      • distance between regions at least 10 bp

Benchmark

  • WGS of 13 hepatocellular carcinomas (HCC)
    • Illumina 2x 90bp
    • mean cov 31.7x to 121.2x
    • 20 HBV validated integration sites (PCR, Sanger)
      • disregard 2 closer than 10 bp
  • RNAseq of 4 HCC cell lines
    • Illumina 2x 101 bp
    • mean 127M reads per sample
    • 11 HBV validated chimeric transcripts (Sanger)

Benchmark

  • TS of 2 Merkel cell carcinomas
    • 2x 100 bp
    • 3.9M and 5M reads
  • Simulated data
    • Illumina 2x 75 bp
    • mean cov 30x
    • mutated HPV-16 + hg19
      • SNPs, indels, SVs

Results

  • Virus genome customization
    • Simulated data
    • 91% SNPs, 67% indel 1 iteration
    • 99.9% identity with consensus HPV-16
      • compared to 99.1% reference

Results

Host genome customization

Sample 26T from WGS HCC

Virus integration site detection

Data type Known integration sites VERSE VirusFinder VirusSeq WGS 20 16 13 - RNA-seq 11 9 8 7 TS 3 3 2 3 Total 34 28 (82%) 23 (68%) -

State of the software

  • Difficult to install
    • many dependencies are old versions
  • The code is not in an online code repository
    • Difficult to propose changes
  • After getting help, still a lot of warnings when running the software on a test set
  • Excellent support right here at MSKCC

Comments

  • Host/viral genome customization potential problems not discussed
    • What is theoretical best iCORN can do?
  • Viral integration sites
    • Are found sites 1 bp resolution?
    • Only sensitivity?
    • What proportion False Positive and False Negative
    • Comparison low confidence high confidence cases
  • Only paired end reads, long single would be useful
  • Multiple virus detection is not automatic

Questions?

Presentation atbit.ly/inotalks
0