尊敬的 微信汇率:1円 ≈ 0.046078 元 支付宝汇率:1円 ≈ 0.046168元 [退出登录]
SlideShare a Scribd company logo
ANCHORED ASSEMBLY
Accurate StructuralVariant Detection
Using Short-Read Data
AA
Bruestle, J.J. and Shekar, S.N.
Methodology
Anchoring
AA
Anchor Assemblies
7 7 7
7
8 89 9
7
8
7
R1 R2
R3 R5
R8R7
R3 R6 R9
Read overlap
assembly
Read Overlap Assembly
Remove Reference ReadsRead Correction
0
0 200 400 600 800 1000 1200
10002000300040005000
K-merCount
Total K-mer Quality Score
K-mer Quality Score Distribution
A* error correction
SV Comparison
Baylor College of Medicine, against Illumina, PacBio,
Array, Nextera and BioNano
Program FDR Sensitivity
CNVnator 80.46% 22.62%
BreakDancer 58.89% 42.39%
Delly 55.13% 31.18%
Crest 14.87% 35.29%
Pindel 31.81% 56.70%
SVStat 1.79% 16.36%
Tiresias 69.04% 7.79%
Spiral 3.03% 42%
English et al. (2015), updated
AA
Fosmid/PacBio validated SVsAA
Validated in collaboration by
Malig, M, Eichler, EE et al.
Selected 15 high confidence
SVs not previously detected in
the 1000 Genomes Project
PacBio validated SVs deleteAA
Chr Call	
  Size	
  (bp) Clones	
  seq	
  with	
  
PacBio
Validated	
  by	
  
Micropeats
Validated	
  by	
  
Dotplots
Call	
  validated?
1 1026 2 2 2 yes
1 6375 2 2 2 yes
2 26838 2 2 2 yes
3 4184 1 1 1 yes
5 9507 2 2 2 yes
7 3013 1 1 1 yes
8 5157 2 1 1 yes
9 2883 1 1 1 yes
15 6051 2 2 2 yes
Malig,	
  M,	
  Eichler,	
  EE	
  et	
  al.	
  (Manuscript	
  in	
  preparation)
PacBio validated SVs InsertsAA
Malig,	
  M,	
  Eichler,	
  EE	
  et	
  al.	
  (Manuscript	
  in	
  preparation)
Chr Call	
  Size	
  (bp) Clones	
  seq	
  with	
  
PacBio
Validated	
  by	
  
Micropeats
Validated	
  by	
  
Dotplots
Call	
  validated?
1 1755 2 2 2 yes
1 3865 2 2 2 yes
8 2457 2 2 2 yes
8 1508 2 2 2 yes
13 2142 2 2 2 yes
X 1548 2 2 2 yes
PacBio validation dotplotsAA
Malig,	
  M,	
  Eichler,	
  EE	
  et	
  al.	
  
(Manuscript	
  in	
  preparation)
Chromosome	
  1	
  
3.8kb	
  insertion
Chromosome	
  1	
  
6.4kb	
  deletion
PacBio validation dotplotsAA
Malig,	
  M,	
  Eichler,	
  EE	
  et	
  al.	
  
(Manuscript	
  in	
  preparation)
Chromosome	
  2	
  
26.8kb	
  deletion
PacBio validation dotplotsAA
Malig,	
  M,	
  Eichler,	
  EE	
  et	
  al.	
  
(Manuscript	
  in	
  preparation)
Ashkenazi Jewish TrioAA
Validated by Noah Spies using his program SVViz
Chr2 Deletion Chr8 Insertion
Reference
Chr 2 Deletion - FatherAA
Alternative
Chr 2 Deletion - MotherAA
Alternative
Reference
Chr 2 Deletion - OffspringAA
Alternative
Reference
Chr 2 Deletion - VCF IdenticalAA
HG002 - Offspring
chr2 34695829 T <DEL> 100 PASS
NS=1;DP=51;SVTYPE=DEL;END=34736567;SVLEN=-40730
DP:AD 51:21,30
HG003 - Father
chr2 34695829 T <DEL> 100 PASS
NS=1;DP=55;SVTYPE=DEL;END=34736567;SVLEN=-40730;
DP:AD 55:0,55
Chr 8 Insertion - FatherAA
Alternative
Reference
Chr 8 Insertion - MotherAA
Alternative
Reference
Chr 8 Insertion - OffspringAA
Alternative
Reference
HG0004
chr8 129739066 AATAAA 100 masking_present NS=1;DP=41;SVTYPE=INS;
END=129739071;SVLEN=3404; DP:AD 41:28,13
Chr 2 Insertion - VCF IdenticalAA
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT
GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT
TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGC
TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG
AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT
AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC
TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA
TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG
CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGGCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC
ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT
ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG
AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAAC
TTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGC
AAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTC
TTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTAC
TTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACT
TTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAG
ATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTG
TAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCC
TATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCA
AAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAG
CGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTG
ACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTG
ATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGG
GTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTAC
CACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAAACT
Mother
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT
GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT
TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAAGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAACAAACCATGTGGACGTGGC
TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG
AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT
AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC
TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCCATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA
TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG
CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC
ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT
ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG
AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAA
CTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATG
CAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCT
CTTTAGAAAAGAAATTGTACGATCGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTA
CTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAAC
TTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCA
GATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCT
GTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTC
CTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGC
AAGAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCA
GCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTT
GACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTT
GATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGG
GGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTA
CCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT
Chr 2 Insertion - VCF IdenticalAA
FatherHG003
chr8 129739066 AATAAA 100 masking_present NS=1;DP=47;SVTYPE=INS;
END=129739071;SVLEN=3405;DP:AD 47:32,15
Chr 2 Insertion - VCF IdenticalAA
HG002
chr8 129739066 AATAAA 100 masking_present NS=1;DP=18;SVTYPE=INS;
END=129739071;SVLEN=3405;DP:AD 18:0,18
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT
GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT
TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGC
TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG
AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT
AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC
TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA
TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG
CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC
ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT
ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG
AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAA
CTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATG
CAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCT
CTTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTA
CTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAAC
TTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCA
GATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCT
GTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTC
CTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGC
AAAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCA
GCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTT
GACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTT
GATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGG
GGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTA
CCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT
Offspring
Chr 2 Insertion - VCF IdenticalAA
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT
GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT
TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGC
TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG
AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT
AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC
TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA
TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG
CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGGCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC
ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT
ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG
AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAAC
TTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGC
AAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTC
TTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTAC
TTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACT
TTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAG
ATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTG
TAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCC
TATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCA
AAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAG
CGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTG
ACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTG
ATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGG
GTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTAC
CACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAAACT
Overlay - SNPs from both parents are
present in the Offspring
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT
GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT
TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAAGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAACAAACCATGTGGACGTGGC
TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG
AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT
AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC
TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCCATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA
TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG
CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC
ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT
ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG
AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAAC
TTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGC
AAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTC
TTTAGAAAAGAAATTGTACGATCGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTAC
TTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACT
TTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAG
ATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTG
TAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCC
TATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCA
AGAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAG
CGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTG
ACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTG
ATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGG
GTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTAC
CACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT
GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT
GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT
TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGC
TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG
AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT
AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC
TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA
TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG
CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC
ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT
ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG
AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAAC
TTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGC
AAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTC
TTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTAC
TTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACT
TTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAG
ATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTG
TAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCC
TATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCA
AAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAG
CGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTG
ACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTG
ATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGG
GTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTAC
CACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT
A
What is GraphBWT?
Core Technology to Make
Anchored Assembly Feasible
• Needed a way to represent the read data that was
graph based
• Fast search for variation from reference directly from
the reads in a whole genome dataset
• Small enough footprint to store a read overlap graph
of whole human genome in memory
23
GraphBWT
• Technology for storing all of the reads that comprise
the variation graph of a whole human genome
• Very compact to fit into memory (1.5 bytes per base)
• In memory, allows for extremely fast searches via
subsequence
24
Resulting technologies
that use GraphBWT
SpEC SV and Query
• SpEC: A lossless compression format that reduces BAM files
to 50% of their original size and that can be analyzed with
existing bioinformatics tools while compressed
• SpEC SV: SpEC that also includes a compact sequence
index, known as a GraphBWT (3GB), which is a graph based
representation of genomic variation
• SpEC Query: an API that reads SpEC SV files to enable rapid
queries of sequence data via location or by a subsequence
26
27
Create a SpEC SV File
Spiral’s SpEC SV File
Query Times
28
Samples,	
  Variant	
  calls SpEC	
  Query	
  using	
  SpEC SpEC	
  Query	
  using	
  SpEC	
  SV
1	
  sample,	
  1	
  variant Milliseconds Milliseconds
1	
  sample,	
  1M	
  variants 10	
  Minutes 5	
  Minutes
1000	
  samples,	
  1	
  variant 10-­‐20	
  Minutes 10-­‐20	
  Minutes
1000	
  samples,	
  1M	
  variants 4	
  Days 2	
  Days
Variant	
  types SNPs	
  and	
  Indels SNPs,	
  Indels	
  and	
  SVs
GraphBWT Technical Details
• Constant time traversal of k-mer graph for any sized
k-mer
• Subsequence search linear with size of sequence
• Storage requirements grow linearly with size of novel
sequence (i.e. variation)
29
Use Cases for SpEC SV
• Search for evidence of variation in read data
• Compare graphs between individuals for unique
variation
• Compare combined graphs of two groups
• Store variation, for example a reference genome
30
Questions?
Niranjan Shekar - VP of Bioinformatics
niranjan@spiralgenetics.com

More Related Content

What's hot

Abrf 2017 hadfield j
Abrf 2017 hadfield jAbrf 2017 hadfield j
Abrf 2017 hadfield j
James Hadfield
 
CRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse ModelingCRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse Modeling
Tristan Kempston
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
Despoina Kalfakakou
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
Golden Helix Inc
 
NGS overview
NGS overviewNGS overview
NGS overview
AllSeq
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
QBiC_Tue
 
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Fabio Caligaris
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
Aureliano Bombarely
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
Justin Johnson
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
François PAILLIER
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
Sebastian Schmeier
 
GENASSIST™ CRISPR & rAAV Genome Editing Tools
GENASSIST™ CRISPR & rAAV Genome Editing ToolsGENASSIST™ CRISPR & rAAV Genome Editing Tools
GENASSIST™ CRISPR & rAAV Genome Editing Tools
Candy Smellie
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
AllSeq
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
James Hadfield
 
Ngs part i 2013
Ngs part i 2013Ngs part i 2013
Ngs part i 2013
Elsa von Licy
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
GenomeInABottle
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
Vall d'Hebron Institute of Research (VHIR)
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Candy Smellie
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
Stephen Turner
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
GenomeInABottle
 

What's hot (20)

Abrf 2017 hadfield j
Abrf 2017 hadfield jAbrf 2017 hadfield j
Abrf 2017 hadfield j
 
CRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse ModelingCRISPR presentation extended Mouse Modeling
CRISPR presentation extended Mouse Modeling
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
NGS overview
NGS overviewNGS overview
NGS overview
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
 
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
GENASSIST™ CRISPR & rAAV Genome Editing Tools
GENASSIST™ CRISPR & rAAV Genome Editing ToolsGENASSIST™ CRISPR & rAAV Genome Editing Tools
GENASSIST™ CRISPR & rAAV Genome Editing Tools
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
Ngs part i 2013
Ngs part i 2013Ngs part i 2013
Ngs part i 2013
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
New data from giab genomes pacbio ccs
New data from giab genomes   pacbio ccsNew data from giab genomes   pacbio ccs
New data from giab genomes pacbio ccs
 

Viewers also liked

Aug2015 analysis team 07 fritz and schatz pac_bio sv
Aug2015 analysis team 07 fritz and schatz pac_bio svAug2015 analysis team 07 fritz and schatz pac_bio sv
Aug2015 analysis team 07 fritz and schatz pac_bio sv
GenomeInABottle
 
Giab roadmap 150820.pptx
Giab roadmap 150820.pptxGiab roadmap 150820.pptx
Giab roadmap 150820.pptx
GenomeInABottle
 
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_careGIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
GenomeInABottle
 
Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1
Cameron Locker, MPH
 
Trab final
Trab final Trab final
Encuesta adulto mayor
Encuesta adulto mayorEncuesta adulto mayor
Encuesta adulto mayor
Juan Camilo Zapata
 
Entrevista jaiver
Entrevista jaiverEntrevista jaiver
Entrevista jaiver
Juan Camilo Zapata
 
Gold Souk, Dubai
Gold Souk, DubaiGold Souk, Dubai
Gold Souk, Dubai
Makala D.
 
Sept2016 smallvar rtg
Sept2016 smallvar rtgSept2016 smallvar rtg
Sept2016 smallvar rtg
GenomeInABottle
 
Jan2016 fritz sedlazeck mapping and sv calling from pac bio
Jan2016 fritz sedlazeck mapping and sv calling from pac bioJan2016 fritz sedlazeck mapping and sv calling from pac bio
Jan2016 fritz sedlazeck mapping and sv calling from pac bio
GenomeInABottle
 
Mi proyecto de vida
Mi proyecto de vidaMi proyecto de vida
Mi proyecto de vida
Jhonatahfernando Florespacheco
 
Gente Tóxica...
Gente Tóxica...Gente Tóxica...
Gente Tóxica...
Pedro Roberto Casanova
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
Genome Reference Consortium
 
Psicomotricidad
PsicomotricidadPsicomotricidad
Psicomotricidad
viviana gallardo
 
25η μαρτίου
25η μαρτίου25η μαρτίου
25η μαρτίου
Athina Kollia
 
«Diseño para todos» en la investigacion social sobre personas con discapacidad
«Diseño para todos» en la investigacion social sobre personas con discapacidad«Diseño para todos» en la investigacion social sobre personas con discapacidad
«Diseño para todos» en la investigacion social sobre personas con discapacidad
Pedro Roberto Casanova
 
La educación domiciliaria y hospitalaria en el nivel secundario 2016
La educación domiciliaria y hospitalaria en el nivel secundario 2016La educación domiciliaria y hospitalaria en el nivel secundario 2016
La educación domiciliaria y hospitalaria en el nivel secundario 2016
Pedro Roberto Casanova
 

Viewers also liked (17)

Aug2015 analysis team 07 fritz and schatz pac_bio sv
Aug2015 analysis team 07 fritz and schatz pac_bio svAug2015 analysis team 07 fritz and schatz pac_bio sv
Aug2015 analysis team 07 fritz and schatz pac_bio sv
 
Giab roadmap 150820.pptx
Giab roadmap 150820.pptxGiab roadmap 150820.pptx
Giab roadmap 150820.pptx
 
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_careGIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
GIAB Sep2016 Lightning Effective Depth Metric yves konigshofer_sera_care
 
Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1Cameron_Locker_variants_final_poster1
Cameron_Locker_variants_final_poster1
 
Trab final
Trab final Trab final
Trab final
 
Encuesta adulto mayor
Encuesta adulto mayorEncuesta adulto mayor
Encuesta adulto mayor
 
Entrevista jaiver
Entrevista jaiverEntrevista jaiver
Entrevista jaiver
 
Gold Souk, Dubai
Gold Souk, DubaiGold Souk, Dubai
Gold Souk, Dubai
 
Sept2016 smallvar rtg
Sept2016 smallvar rtgSept2016 smallvar rtg
Sept2016 smallvar rtg
 
Jan2016 fritz sedlazeck mapping and sv calling from pac bio
Jan2016 fritz sedlazeck mapping and sv calling from pac bioJan2016 fritz sedlazeck mapping and sv calling from pac bio
Jan2016 fritz sedlazeck mapping and sv calling from pac bio
 
Mi proyecto de vida
Mi proyecto de vidaMi proyecto de vida
Mi proyecto de vida
 
Gente Tóxica...
Gente Tóxica...Gente Tóxica...
Gente Tóxica...
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Psicomotricidad
PsicomotricidadPsicomotricidad
Psicomotricidad
 
25η μαρτίου
25η μαρτίου25η μαρτίου
25η μαρτίου
 
«Diseño para todos» en la investigacion social sobre personas con discapacidad
«Diseño para todos» en la investigacion social sobre personas con discapacidad«Diseño para todos» en la investigacion social sobre personas con discapacidad
«Diseño para todos» en la investigacion social sobre personas con discapacidad
 
La educación domiciliaria y hospitalaria en el nivel secundario 2016
La educación domiciliaria y hospitalaria en el nivel secundario 2016La educación domiciliaria y hospitalaria en el nivel secundario 2016
La educación domiciliaria y hospitalaria en el nivel secundario 2016
 

Similar to Aug2015 analysis team spiral genetics

Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
Lex Nederbragt
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
GenomeInABottle
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
GenomeInABottle
 
Anis2 Gp Tonini
Anis2   Gp ToniniAnis2   Gp Tonini
Anis2 Gp Tonini
ATkoala
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
Deanna Church
 
Shape Signatures Light
Shape Signatures LightShape Signatures Light
Shape Signatures Light
Dmitriy Chekmarev
 
26072016 uc davis_small
26072016 uc davis_small26072016 uc davis_small
26072016 uc davis_small
Benjamin Schwessinger
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
Mark Pallen
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
External RNA Controls Consortium
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Thermo Fisher Scientific
 
Bacterial transcriptome profiling using Ion Torrent Proton™ technology
Bacterial transcriptome profiling using Ion Torrent Proton™ technologyBacterial transcriptome profiling using Ion Torrent Proton™ technology
Bacterial transcriptome profiling using Ion Torrent Proton™ technology
Thermo Fisher Scientific
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
Gunnar Rätsch
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
Ulises Urzua
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Elia Brodsky
 
Cell 671
Cell 671Cell 671
Zymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel CrollZymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_results
Andrea Ujvari
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...
Borlaug Global Rust Initiative
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
Spencer Bliven
 

Similar to Aug2015 analysis team spiral genetics (20)

Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
 
160627 giab for festival sv workshop
160627 giab for festival sv workshop160627 giab for festival sv workshop
160627 giab for festival sv workshop
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
Anis2 Gp Tonini
Anis2   Gp ToniniAnis2   Gp Tonini
Anis2 Gp Tonini
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Shape Signatures Light
Shape Signatures LightShape Signatures Light
Shape Signatures Light
 
26072016 uc davis_small
26072016 uc davis_small26072016 uc davis_small
26072016 uc davis_small
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...Characterization of Novel ctDNA Reference Materials Developed using the Genom...
Characterization of Novel ctDNA Reference Materials Developed using the Genom...
 
Bacterial transcriptome profiling using Ion Torrent Proton™ technology
Bacterial transcriptome profiling using Ion Torrent Proton™ technologyBacterial transcriptome profiling using Ion Torrent Proton™ technology
Bacterial transcriptome profiling using Ion Torrent Proton™ technology
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Cell 671
Cell 671Cell 671
Cell 671
 
Zymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel CrollZymoseptoria Community meeting Kiel 2017 - Daniel Croll
Zymoseptoria Community meeting Kiel 2017 - Daniel Croll
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_results
 
The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...The wheat genome sequence: a foundation for accelerating improvment of bread ...
The wheat genome sequence: a foundation for accelerating improvment of bread ...
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
 

More from GenomeInABottle

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
GenomeInABottle
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
GenomeInABottle
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
GenomeInABottle
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
GenomeInABottle
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
GenomeInABottle
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
GenomeInABottle
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
GenomeInABottle
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
GenomeInABottle
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GenomeInABottle
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
GenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
GenomeInABottle
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
GenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
GenomeInABottle
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GenomeInABottle
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
GenomeInABottle
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GenomeInABottle
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
GenomeInABottle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
GenomeInABottle
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
GenomeInABottle
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
GenomeInABottle
 

More from GenomeInABottle (20)

2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
Stratomod ASHG 2023
Stratomod ASHG 2023Stratomod ASHG 2023
Stratomod ASHG 2023
 
GIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdfGIAB_ASHG_JZook_2023.pdf
GIAB_ASHG_JZook_2023.pdf
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...Genome in a Bottle- reference materials to benchmark challenging variants and...
Genome in a Bottle- reference materials to benchmark challenging variants and...
 
GIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussionGIAB Technical Germline Benchmark roadmap discussion
GIAB Technical Germline Benchmark roadmap discussion
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab agbt small_var_2020
Giab agbt small_var_2020Giab agbt small_var_2020
Giab agbt small_var_2020
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GHGa4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
Ga4gh 2019 - Assuring data quality with benchmarking tools from GIAB and GA4GH
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATKGIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
GIAB GRC Workshop ASHG 2019 Billy Rowell Evaluation of v4 with CCS GATK
 
GIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant posterGIAB ASHG 2019 Small Variant poster
GIAB ASHG 2019 Small Variant poster
 
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant BenchmarkGRC GIAB Workshop ASHG 2019 Small Variant Benchmark
GRC GIAB Workshop ASHG 2019 Small Variant Benchmark
 
Jason Chin MHC diploid assembly
Jason Chin MHC diploid assemblyJason Chin MHC diploid assembly
Jason Chin MHC diploid assembly
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 

Recently uploaded

PHOSPHORUS.BHMS.MATERIA MEDICA..HOMOEOPATHY
PHOSPHORUS.BHMS.MATERIA MEDICA..HOMOEOPATHYPHOSPHORUS.BHMS.MATERIA MEDICA..HOMOEOPATHY
PHOSPHORUS.BHMS.MATERIA MEDICA..HOMOEOPATHY
DRPREETHIJAMESP
 
Call Girls Lucknow 9024918724 Vip Call Girls Lucknow
Call Girls Lucknow 9024918724 Vip Call Girls LucknowCall Girls Lucknow 9024918724 Vip Call Girls Lucknow
Call Girls Lucknow 9024918724 Vip Call Girls Lucknow
nandinirastogi03
 
Patna Call Girls 7742996321 Patna Escorts Service
Patna  Call Girls 7742996321 Patna Escorts ServicePatna  Call Girls 7742996321 Patna Escorts Service
Patna Call Girls 7742996321 Patna Escorts Service
Leela Sahu
 
Congenital Disorders of digestive tract.pptx
Congenital Disorders of digestive tract.pptxCongenital Disorders of digestive tract.pptx
Congenital Disorders of digestive tract.pptx
Bhavyakelawadiya
 
cardiovascular diseases in child health nursing
cardiovascular diseases in child health nursingcardiovascular diseases in child health nursing
cardiovascular diseases in child health nursing
Bhavyakelawadiya
 
Call Girls Mumbai Just Call 9920874524 Top Class Call Girl Service Available
Call Girls Mumbai Just Call 9920874524 Top Class Call Girl Service AvailableCall Girls Mumbai Just Call 9920874524 Top Class Call Girl Service Available
Call Girls Mumbai Just Call 9920874524 Top Class Call Girl Service Available
hanshkumar9870
 
Call Girls Goa (india) +91-7426014248 Goa Call Girls
Call Girls Goa (india) +91-7426014248 Goa Call GirlsCall Girls Goa (india) +91-7426014248 Goa Call Girls
Call Girls Goa (india) +91-7426014248 Goa Call Girls
sagarvarma453
 
Digital Primary Care: From Research into Policy and Practice
Digital Primary Care: From Research into Policy and PracticeDigital Primary Care: From Research into Policy and Practice
Digital Primary Care: From Research into Policy and Practice
Josep Vidal-Alaball
 
Call Girls Saharanpur ☎️ +91-7426014248 😍 Saharanpur Call Girl Beauty Girls S...
Call Girls Saharanpur ☎️ +91-7426014248 😍 Saharanpur Call Girl Beauty Girls S...Call Girls Saharanpur ☎️ +91-7426014248 😍 Saharanpur Call Girl Beauty Girls S...
Call Girls Saharanpur ☎️ +91-7426014248 😍 Saharanpur Call Girl Beauty Girls S...
jiaulalam7655
 
Call Girls Bangalore🔥9024918724🔥Best Profile Escorts in Bangalore Available 2...
Call Girls Bangalore🔥9024918724🔥Best Profile Escorts in Bangalore Available 2...Call Girls Bangalore🔥9024918724🔥Best Profile Escorts in Bangalore Available 2...
Call Girls Bangalore🔥9024918724🔥Best Profile Escorts in Bangalore Available 2...
Jasmine Rawat
 
Call Girls Electronic City 🥰 Bangalore Call Girl No Advance Book Now
Call Girls Electronic City 🥰 Bangalore Call Girl No Advance Book NowCall Girls Electronic City 🥰 Bangalore Call Girl No Advance Book Now
Call Girls Electronic City 🥰 Bangalore Call Girl No Advance Book Now
saftyhealth48
 
Discovering Human Gut Microbiome Dynamics
Discovering Human Gut Microbiome DynamicsDiscovering Human Gut Microbiome Dynamics
Discovering Human Gut Microbiome Dynamics
Larry Smarr
 
Call Girls Omr Road 8824825030 Top Class Chennai Escorts Available
Call Girls Omr Road 8824825030 Top Class Chennai Escorts AvailableCall Girls Omr Road 8824825030 Top Class Chennai Escorts Available
Call Girls Omr Road 8824825030 Top Class Chennai Escorts Available
simrankaur
 
Helminthiasis or Worm infestation in Children for Nursing students
Helminthiasis or Worm infestation in Children for Nursing studentsHelminthiasis or Worm infestation in Children for Nursing students
Helminthiasis or Worm infestation in Children for Nursing students
RAJU B N
 
Storyboard on Skin- Innovative Learning (M-pharm) 2nd sem. (Cosmetics)
Storyboard on Skin- Innovative Learning (M-pharm) 2nd sem. (Cosmetics)Storyboard on Skin- Innovative Learning (M-pharm) 2nd sem. (Cosmetics)
Storyboard on Skin- Innovative Learning (M-pharm) 2nd sem. (Cosmetics)
MuskanShingari
 
Molecular and Cellular Mechanism of Action of Hormones like Growth Hormone an...
Molecular and Cellular Mechanism of Action of Hormones like Growth Hormone an...Molecular and Cellular Mechanism of Action of Hormones like Growth Hormone an...
Molecular and Cellular Mechanism of Action of Hormones like Growth Hormone an...
Kshama Mundokar
 
Part III - Cumulative Grief: Learning how to honor the many losses that occur...
Part III - Cumulative Grief: Learning how to honor the many losses that occur...Part III - Cumulative Grief: Learning how to honor the many losses that occur...
Part III - Cumulative Grief: Learning how to honor the many losses that occur...
bkling
 
Predictabilty and Preventability Assessment, Management of ADR, Terminologies...
Predictabilty and Preventability Assessment, Management of ADR, Terminologies...Predictabilty and Preventability Assessment, Management of ADR, Terminologies...
Predictabilty and Preventability Assessment, Management of ADR, Terminologies...
Kshama Mundokar
 
Engaging the Media to Amplify Public Health Messaging
Engaging the Media to Amplify Public Health MessagingEngaging the Media to Amplify Public Health Messaging
Engaging the Media to Amplify Public Health Messaging
katiequigley33
 
Call Girls Gorakhpur 7742996321 Gorakhpur Escorts Service
Call Girls Gorakhpur 7742996321 Gorakhpur Escorts ServiceCall Girls Gorakhpur 7742996321 Gorakhpur Escorts Service
Call Girls Gorakhpur 7742996321 Gorakhpur Escorts Service
kapilsharma3523
 

Recently uploaded (20)

PHOSPHORUS.BHMS.MATERIA MEDICA..HOMOEOPATHY
PHOSPHORUS.BHMS.MATERIA MEDICA..HOMOEOPATHYPHOSPHORUS.BHMS.MATERIA MEDICA..HOMOEOPATHY
PHOSPHORUS.BHMS.MATERIA MEDICA..HOMOEOPATHY
 
Call Girls Lucknow 9024918724 Vip Call Girls Lucknow
Call Girls Lucknow 9024918724 Vip Call Girls LucknowCall Girls Lucknow 9024918724 Vip Call Girls Lucknow
Call Girls Lucknow 9024918724 Vip Call Girls Lucknow
 
Patna Call Girls 7742996321 Patna Escorts Service
Patna  Call Girls 7742996321 Patna Escorts ServicePatna  Call Girls 7742996321 Patna Escorts Service
Patna Call Girls 7742996321 Patna Escorts Service
 
Congenital Disorders of digestive tract.pptx
Congenital Disorders of digestive tract.pptxCongenital Disorders of digestive tract.pptx
Congenital Disorders of digestive tract.pptx
 
cardiovascular diseases in child health nursing
cardiovascular diseases in child health nursingcardiovascular diseases in child health nursing
cardiovascular diseases in child health nursing
 
Call Girls Mumbai Just Call 9920874524 Top Class Call Girl Service Available
Call Girls Mumbai Just Call 9920874524 Top Class Call Girl Service AvailableCall Girls Mumbai Just Call 9920874524 Top Class Call Girl Service Available
Call Girls Mumbai Just Call 9920874524 Top Class Call Girl Service Available
 
Call Girls Goa (india) +91-7426014248 Goa Call Girls
Call Girls Goa (india) +91-7426014248 Goa Call GirlsCall Girls Goa (india) +91-7426014248 Goa Call Girls
Call Girls Goa (india) +91-7426014248 Goa Call Girls
 
Digital Primary Care: From Research into Policy and Practice
Digital Primary Care: From Research into Policy and PracticeDigital Primary Care: From Research into Policy and Practice
Digital Primary Care: From Research into Policy and Practice
 
Call Girls Saharanpur ☎️ +91-7426014248 😍 Saharanpur Call Girl Beauty Girls S...
Call Girls Saharanpur ☎️ +91-7426014248 😍 Saharanpur Call Girl Beauty Girls S...Call Girls Saharanpur ☎️ +91-7426014248 😍 Saharanpur Call Girl Beauty Girls S...
Call Girls Saharanpur ☎️ +91-7426014248 😍 Saharanpur Call Girl Beauty Girls S...
 
Call Girls Bangalore🔥9024918724🔥Best Profile Escorts in Bangalore Available 2...
Call Girls Bangalore🔥9024918724🔥Best Profile Escorts in Bangalore Available 2...Call Girls Bangalore🔥9024918724🔥Best Profile Escorts in Bangalore Available 2...
Call Girls Bangalore🔥9024918724🔥Best Profile Escorts in Bangalore Available 2...
 
Call Girls Electronic City 🥰 Bangalore Call Girl No Advance Book Now
Call Girls Electronic City 🥰 Bangalore Call Girl No Advance Book NowCall Girls Electronic City 🥰 Bangalore Call Girl No Advance Book Now
Call Girls Electronic City 🥰 Bangalore Call Girl No Advance Book Now
 
Discovering Human Gut Microbiome Dynamics
Discovering Human Gut Microbiome DynamicsDiscovering Human Gut Microbiome Dynamics
Discovering Human Gut Microbiome Dynamics
 
Call Girls Omr Road 8824825030 Top Class Chennai Escorts Available
Call Girls Omr Road 8824825030 Top Class Chennai Escorts AvailableCall Girls Omr Road 8824825030 Top Class Chennai Escorts Available
Call Girls Omr Road 8824825030 Top Class Chennai Escorts Available
 
Helminthiasis or Worm infestation in Children for Nursing students
Helminthiasis or Worm infestation in Children for Nursing studentsHelminthiasis or Worm infestation in Children for Nursing students
Helminthiasis or Worm infestation in Children for Nursing students
 
Storyboard on Skin- Innovative Learning (M-pharm) 2nd sem. (Cosmetics)
Storyboard on Skin- Innovative Learning (M-pharm) 2nd sem. (Cosmetics)Storyboard on Skin- Innovative Learning (M-pharm) 2nd sem. (Cosmetics)
Storyboard on Skin- Innovative Learning (M-pharm) 2nd sem. (Cosmetics)
 
Molecular and Cellular Mechanism of Action of Hormones like Growth Hormone an...
Molecular and Cellular Mechanism of Action of Hormones like Growth Hormone an...Molecular and Cellular Mechanism of Action of Hormones like Growth Hormone an...
Molecular and Cellular Mechanism of Action of Hormones like Growth Hormone an...
 
Part III - Cumulative Grief: Learning how to honor the many losses that occur...
Part III - Cumulative Grief: Learning how to honor the many losses that occur...Part III - Cumulative Grief: Learning how to honor the many losses that occur...
Part III - Cumulative Grief: Learning how to honor the many losses that occur...
 
Predictabilty and Preventability Assessment, Management of ADR, Terminologies...
Predictabilty and Preventability Assessment, Management of ADR, Terminologies...Predictabilty and Preventability Assessment, Management of ADR, Terminologies...
Predictabilty and Preventability Assessment, Management of ADR, Terminologies...
 
Engaging the Media to Amplify Public Health Messaging
Engaging the Media to Amplify Public Health MessagingEngaging the Media to Amplify Public Health Messaging
Engaging the Media to Amplify Public Health Messaging
 
Call Girls Gorakhpur 7742996321 Gorakhpur Escorts Service
Call Girls Gorakhpur 7742996321 Gorakhpur Escorts ServiceCall Girls Gorakhpur 7742996321 Gorakhpur Escorts Service
Call Girls Gorakhpur 7742996321 Gorakhpur Escorts Service
 

Aug2015 analysis team spiral genetics

  • 1. ANCHORED ASSEMBLY Accurate StructuralVariant Detection Using Short-Read Data AA Bruestle, J.J. and Shekar, S.N.
  • 2. Methodology Anchoring AA Anchor Assemblies 7 7 7 7 8 89 9 7 8 7 R1 R2 R3 R5 R8R7 R3 R6 R9 Read overlap assembly Read Overlap Assembly Remove Reference ReadsRead Correction 0 0 200 400 600 800 1000 1200 10002000300040005000 K-merCount Total K-mer Quality Score K-mer Quality Score Distribution A* error correction
  • 3. SV Comparison Baylor College of Medicine, against Illumina, PacBio, Array, Nextera and BioNano Program FDR Sensitivity CNVnator 80.46% 22.62% BreakDancer 58.89% 42.39% Delly 55.13% 31.18% Crest 14.87% 35.29% Pindel 31.81% 56.70% SVStat 1.79% 16.36% Tiresias 69.04% 7.79% Spiral 3.03% 42% English et al. (2015), updated AA
  • 4. Fosmid/PacBio validated SVsAA Validated in collaboration by Malig, M, Eichler, EE et al. Selected 15 high confidence SVs not previously detected in the 1000 Genomes Project
  • 5. PacBio validated SVs deleteAA Chr Call  Size  (bp) Clones  seq  with   PacBio Validated  by   Micropeats Validated  by   Dotplots Call  validated? 1 1026 2 2 2 yes 1 6375 2 2 2 yes 2 26838 2 2 2 yes 3 4184 1 1 1 yes 5 9507 2 2 2 yes 7 3013 1 1 1 yes 8 5157 2 1 1 yes 9 2883 1 1 1 yes 15 6051 2 2 2 yes Malig,  M,  Eichler,  EE  et  al.  (Manuscript  in  preparation)
  • 6. PacBio validated SVs InsertsAA Malig,  M,  Eichler,  EE  et  al.  (Manuscript  in  preparation) Chr Call  Size  (bp) Clones  seq  with   PacBio Validated  by   Micropeats Validated  by   Dotplots Call  validated? 1 1755 2 2 2 yes 1 3865 2 2 2 yes 8 2457 2 2 2 yes 8 1508 2 2 2 yes 13 2142 2 2 2 yes X 1548 2 2 2 yes
  • 7. PacBio validation dotplotsAA Malig,  M,  Eichler,  EE  et  al.   (Manuscript  in  preparation) Chromosome  1   3.8kb  insertion
  • 8. Chromosome  1   6.4kb  deletion PacBio validation dotplotsAA Malig,  M,  Eichler,  EE  et  al.   (Manuscript  in  preparation)
  • 9. Chromosome  2   26.8kb  deletion PacBio validation dotplotsAA Malig,  M,  Eichler,  EE  et  al.   (Manuscript  in  preparation)
  • 10. Ashkenazi Jewish TrioAA Validated by Noah Spies using his program SVViz Chr2 Deletion Chr8 Insertion
  • 11. Reference Chr 2 Deletion - FatherAA Alternative
  • 12. Chr 2 Deletion - MotherAA Alternative Reference
  • 13. Chr 2 Deletion - OffspringAA Alternative Reference
  • 14. Chr 2 Deletion - VCF IdenticalAA HG002 - Offspring chr2 34695829 T <DEL> 100 PASS NS=1;DP=51;SVTYPE=DEL;END=34736567;SVLEN=-40730 DP:AD 51:21,30 HG003 - Father chr2 34695829 T <DEL> 100 PASS NS=1;DP=55;SVTYPE=DEL;END=34736567;SVLEN=-40730; DP:AD 55:0,55
  • 15. Chr 8 Insertion - FatherAA Alternative Reference
  • 16. Chr 8 Insertion - MotherAA Alternative Reference
  • 17. Chr 8 Insertion - OffspringAA Alternative Reference
  • 18. HG0004 chr8 129739066 AATAAA 100 masking_present NS=1;DP=41;SVTYPE=INS; END=129739071;SVLEN=3404; DP:AD 41:28,13 Chr 2 Insertion - VCF IdenticalAA GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGC TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGGCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAAC TTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGC AAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTC TTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTAC TTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACT TTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAG ATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTG TAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCC TATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCA AAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAG CGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTG ACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTG ATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGG GTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTAC CACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAAACT Mother
  • 19. GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAAGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAACAAACCATGTGGACGTGGC TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCCATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAA CTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATG CAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCT CTTTAGAAAAGAAATTGTACGATCGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTA CTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAAC TTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCA GATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCT GTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTC CTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGC AAGAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCA GCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTT GACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTT GATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGG GGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTA CCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT Chr 2 Insertion - VCF IdenticalAA FatherHG003 chr8 129739066 AATAAA 100 masking_present NS=1;DP=47;SVTYPE=INS; END=129739071;SVLEN=3405;DP:AD 47:32,15
  • 20. Chr 2 Insertion - VCF IdenticalAA HG002 chr8 129739066 AATAAA 100 masking_present NS=1;DP=18;SVTYPE=INS; END=129739071;SVLEN=3405;DP:AD 18:0,18 GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGC TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAA CTTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATG CAAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCT CTTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTA CTTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAAC TTTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCA GATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCT GTAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTC CTATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGC AAAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCA GCGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTT GACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTT GATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGG GGTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTA CCACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT Offspring
  • 21. Chr 2 Insertion - VCF IdenticalAA GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGC TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGGCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAAC TTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGC AAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTC TTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTAC TTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACT TTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAG ATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTG TAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCC TATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCA AAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAG CGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTG ACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTG ATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGG GTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTAC CACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAAACT Overlay - SNPs from both parents are present in the Offspring GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAAGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAACAAACCATGTGGACGTGGC TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCCATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAAC TTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGC AAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTC TTTAGAAAAGAAATTGTACGATCGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTAC TTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACT TTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAG ATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTG TAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCC TATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCA AGAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAG CGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTG ACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTG ATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGG GTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTAC CACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT GAAAGTTGTGCACAATATAAAAATTATAATTTTATATTTCAAAACAAATTAAATATCTTAAAATTATAGAAGACATTAAAGAACTATATAAATCAAAGTTAGAAAAAAACCCAGATGTGAGTTGGGAAATCT GAAGAAAATTTAAAAGAGAAATTTAAAAATATTAATATAAAAAATGAAGACTAAACTGGAAAGATACACTAGAGTGAATAAACAAAATAGAAAATACTCAGATGTTTTGTTTTAACTTCCTATTGTATGAGT TTTGGAGACCAACATAAGATAATGACTTGCCTCTGGATATGAAGGTAAAAAAACAGACACAGGCCTATGTAGTGATTTCTTACAGAACAACACAGCAGAAAGCAAATCCCTAAAAAACCATGTGGACGTGGC TTTTACAGATGGTTGTCCAATCCCTGCATGCTATTGCTTGCTTATGGATGAGTGAAAGGAATAAAAATTTTAAGTTATAGCTACAGTTTCTCTACCTGTACATTCCAATACTGACCTTGCATGGTTTCTTTG AGGGCTAAGTATGACAAAAGGATGCAGTGATTTCGAACTTAGATTTTAAAAAACAAATATGACTCTTTTGAACTGTGTGAACATAGGCACATTGCTGGATCTGAGTAATTTCATGTGCTAAGAGGGAATAAT AGCATCAGCCTTAATGCTGCATTATTGCATTTAGCATTATTTCTTTCTGAAGATGAAAGAAGACAGACATCATTATATTTAACAAAGTGCCTGGCACATATAAAATAGTCAATAAATGTTATCTATCATTGC TATTATTACCTAATACTGCGCATAGTAAGCCCTGAACCTGTTCCTGGCATGTGGAGCTATGTACTATGTTCATGACATTAAACAAAGTAGTAGCTATATAATGAATATATAAATGTGACTTTTATTATTACA TCCCTGTAGTTTTGGCAAGTAGTTTACTAAAAGGAAGTTCCAATTTTGACTTAGCATGGAGTTTGTTTTTATCCTGGGCATGTTATCTACCCAGCTTACCTTATTCTTCCTTTCTTCGAAACAGGAATCATG CTTGTTAATTGACTGGGATGTTGCAAGGCCTTATATCTGAGAAGTATATCATCAAGGAATAGAGAAGATGAGTCTTAGAGAGGAAATGGAAGCCCCTGTCAATTAGGAGAGGCAAAGGCTCTATGATACAGC ATAGCCTAAGAATTTCGTTGATACAGAATTCTAAGTGTGAAACGAATGAATGGAGTGACCACTCCTCCCTACTAAAGAATCTTGTAAACACTAGTTTTAAAAGCACAAACGTATTATATCATATACCTATGT ATAATGTCATTTTGCTACTTTTCTCTTCCAACTTCTCAAATCTTTGAATGCAGGGGTTTTTGGAGTTATTCCCTGTGTATTATTTCGACTGATATGTAATAGCTGCTCAGCGAATGTGTGTTGCTAAATAAG AGATGGAGTACAGACAAGCTGAAATTGCACATTTATGTTGCCATTGTACTGCTCAAAAAAAAAAAAAATTAGAATTAGGGTTAAAGAGAGTGCTCAGGCCCTAGACTAGGATTTATTTGCTGTATAATAAAC TTTATGCAAACAATTAACCTCCCTGCTTCTCAATTTTCTCCTCTATAAAATTGGGTTATTACAAAATTCTTTGTAACATATTATGGAGTTCAATTAGGATAAGTTAAATATTGGAAATCTGAAGCATTATGC AAATATGAGGCATTATTATAATAATTATAACAATATTGTTTTTATTCTTAATTGCTACTCTTGAGTACTCTGTTGCTCTGCAGATATCTCTCTCCCTGCCTTCCCCAGGGTGTTGGCATACCAGGATGCCTC TTTAGAAAAGAAATTGCACGATGGACCTAAGGAAACAGAATTTTCCATCCTGGCATTTGCATAAAGGCCACACATGCATAGCCATATGCTGATTTAACCAACAGCTTTCACACTTATAATCGAGTTTGCTAC TTGTTCTGCGATATCTACTCTCCCTCTATTTCTTATTAATAGAACAAAATTTTAGTTGGGAATATGGCTACTGAGAATAAAGATTACCTTTCTCAGCTTCTTTGCGGCTAACTCTGATTATGTGTCTAAACT TTGGTGCATATTTTGGTAAACGGAAATTCTTAAAGGGAGATTCTCTACTTTCTTCCTGCTTGCTGGAATGCAAACATGATTGTTGGATTTGAGCAGCCATCTTATGCCAGGAGTTGGAAACCATGCAGGCAG ATCCTGGTCATTAGCACTACAGACCTCTATACCAAACTGGATTTCTGTGAGACTCCAGAAGAAAGTAAGCAGCACAAGGAGTTTCTTCATGTATTCTTCATTTCCCACACCCCATTATACGTGCTTTTGCTG TAATCTGGAATCAGTTGTACTAATCTACTGCACATACCTAGATTCTATTGATAGTCTATTCCAGGATTGATAACTTTGAGCCCAGATAACTTGCAGTAAGATTTATAACAAGATTTCAAAAATATTCTTTCC TATACACCAAATAGTTTTGGTTAGAGAAAACAAAACTTTTGGCATAGCAACTTCATTTGTAGGAAGTTACCTTCTTAAAATTGTTTATCTGTGGACAGCTATGCTGCTATTAGTAGGGAATGGTTTCAGGCA AAAGGTTACAGAAGGATGGAGAGGGCCTGGGCTTTGGGGTTCCAGGGGTATGGAAGTCAGCAGAGCTGAGAGTAGTTCCCAACAGCCAGAGTGTCCATGGATCAAGCCCTTTTGTGAAGCTGGAGGTACCAG CGCTGGTCCAGGATGCGCAGCTGTAAAGTTGTGAATATATGTATTTGGTCTTTTTCCTTGTTTGCTGGCCTACAACTCTTAAAATCCTTGGAATCTTCAAAGTGATGTGTCTTTTTGTATGCTAATGAGTTG ACTAATGGCTGGCAGCCTCTAGGTGGCTTCTGGATAAGAGCTGGTCACCAGGAAGACCAAGGCCAGATTAGAGGGTTGGGACATTCGGTCCTACTCCGCAACCACCATGGAGACAGTCTGAAGGTTAACTTG ATCACCAATGGCCAATAATTTCATCAATCATGCCAGTGTAATGAAGCCAGCATAAAAACTCAAAAGGACAGGGCTCAGAGAGTTCCATTAGCTGAACATTGGAGGTTCCCACAAGTGGCATGCCCGGAGGGG GTTATGGAAGCTTCACACCCTTTCCCCATACCTCACCCTGTGCATCTCTTCATCTGTATCTTCTGTAATATCCTTTATAATACGCCATTAAATATAAGGAAGTATTTCTCTGAGTTCTGTGAGCCACTCTAC CACATTAATCGAACCCCATGGGGAAGCTGAGTAAAGTTTCAAGTGGAGTAAAATTGCTGATACCGTGACCATCAGGTCAATGTTGCTGGAAGCACAGGTAAAACAACCT A
  • 23. Core Technology to Make Anchored Assembly Feasible • Needed a way to represent the read data that was graph based • Fast search for variation from reference directly from the reads in a whole genome dataset • Small enough footprint to store a read overlap graph of whole human genome in memory 23
  • 24. GraphBWT • Technology for storing all of the reads that comprise the variation graph of a whole human genome • Very compact to fit into memory (1.5 bytes per base) • In memory, allows for extremely fast searches via subsequence 24
  • 26. SpEC SV and Query • SpEC: A lossless compression format that reduces BAM files to 50% of their original size and that can be analyzed with existing bioinformatics tools while compressed • SpEC SV: SpEC that also includes a compact sequence index, known as a GraphBWT (3GB), which is a graph based representation of genomic variation • SpEC Query: an API that reads SpEC SV files to enable rapid queries of sequence data via location or by a subsequence 26
  • 27. 27 Create a SpEC SV File Spiral’s SpEC SV File
  • 28. Query Times 28 Samples,  Variant  calls SpEC  Query  using  SpEC SpEC  Query  using  SpEC  SV 1  sample,  1  variant Milliseconds Milliseconds 1  sample,  1M  variants 10  Minutes 5  Minutes 1000  samples,  1  variant 10-­‐20  Minutes 10-­‐20  Minutes 1000  samples,  1M  variants 4  Days 2  Days Variant  types SNPs  and  Indels SNPs,  Indels  and  SVs
  • 29. GraphBWT Technical Details • Constant time traversal of k-mer graph for any sized k-mer • Subsequence search linear with size of sequence • Storage requirements grow linearly with size of novel sequence (i.e. variation) 29
  • 30. Use Cases for SpEC SV • Search for evidence of variation in read data • Compare graphs between individuals for unique variation • Compare combined graphs of two groups • Store variation, for example a reference genome 30
  • 31. Questions? Niranjan Shekar - VP of Bioinformatics niranjan@spiralgenetics.com
  翻译: