Exploring the diversity of unmapped reads from human deep sequencing – Amin Saffari – Data and Preparation



Exploring the diversity of unmapped reads from human deep sequencing – Amin Saffari – Data and Preparation

0 0


Presentation

Last presenation

On Github khikho / Presentation

Exploring the diversity of unmapped reads from human deep sequencing

Amin Saffari

13-08-12

Aligning

  • Mapped region
  • Unmapped region
    • Very high or low GC-content
    • Sequencing error
    • Repeat elements
    • Currently not discovered

Data and Preparation

Data and Preparation

Workflows

Pros and Cons

  • Strategy 1:
    • More reads lead to better assembly
    • Higher N50
    • Slow
    • Needs a lot of memory
  • Strategy 2:
    • Fast
    • Some contigs disappeared

Assembly Results

Assembly Results

Higher N50 or longer asmLg?

Compare Assemblies

Assembly parameters (kmer-value/coverage-cutoff) as a function of indicators (asmlength/N50)

Every Kmer/cvCut are useful

  • Combine assembly results
  • Blood - Blood
  • Saliva - Saliva
  • Blood - Saliva
  • Extract non-redundant contiges

Combining blood results

  • ~113 million contigs
  • ~4 million non-redundant contigs
  • ¼ annotated against the nr_DB

Top3

  • Pongo abelii (Sumatran orangutan)
  • Female
  • Chromosome X
  • Blood
Pongo abelii chromosome X unlocalized genomic

Top3

  • Toxoplasma gondii (Parasite)
    • Carried by many warm-blooded animals
    • 30%-65% of human have antibody
  • Malassezia globosa (fungi)
    • Naturally found on the skin surfaces of many animals, including humans

Combining Saliva results

  • ~510 million contigs
  • ~3 million non-redundant contigs
  • ¾ annotated against the nr_DB

Unannotated part

Comparing the length of unannotated contigs

Blood & Saliva

  • ~113 million contigs(Blood)
  • ~510 million contigs (Saliva)
  • 60171 non-redundant contigs
  • 1/6 annotated
DNA should be more or less the same on different tissues

Blood & Saliva

  • All annotated are bacteria
    • Streptococcus mitis SK579
    • Prevotella

Taxonomy dist and freq of blood results

Chimp data

Data sampled from gut

Human & Chimp

  • ~105 million contigs(Gut)
  • 60171 non-redundant contigs (Blood and saliva)
  • 52113 non-redundant contigs (Blood and saliva & Gut)
  • 1/8 annotated

Taxonomy dist and freq of human & chimp results

Database

Conclusion & Future works

  • Some sequenced mapped to closely related species
  • Larg fraction of bacteria lives with us
  • Looking to the unannotated contigs
  • Looking to the orphan reads

Acknowledgements

Thank You!