Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

This is a web-based collaboration area for learning about the science and technology of the Broad. The emphasis is on introductory materials appropriate for new employees or those wanting to learn about areas other than their own. If you have corrections or additions please feel free to edit this page or e-mail zleber@broadinstitute.org. Thank you.


 

Genetics

The Cartoon Guide to Genetics: This 1991 book remains a great introduction to genetics and is a highly enjoyable read. See Zach Leber to borrow a copy.

Primer on Molecular Genetics: A 1992 PDF primer from the Human Genome Project.

 

Sequencing

https://apps.broadinstitute.org/twiki/pub/TechnicalTraining/WebHome/Sequencing_tech_2011.pdf - Niall Lennon's technology poster

Sanger Sequencing

This video by Broad member Chad Nusbaum describes the Sanger sequencing techniques that have already been superseded by the Broad's next-gen machines. But the video is certainly worth viewing as it describes the fundamental breakthrough of dye-terminator sequencing. This video plays at 7CC in the DNAtruim.

Next-gen Sequencing

Next-generation sequencing uses very different sample preparation, chemistry, and imaging to produce much more data per run than the Sanger-type machines used for almost 30 years. The Broad started using next-generation sequencing in 2005. You can read an overview of the transition to next-gen or an in-depth review of the technology.  Presentations from a 2010 Broad workshop on next-gen sequencing can be found here.

Illumina/Solexa

Illumina machines are the current workhorses of the Broad. This short whitepaper is a good introduction to their technology. You can also watch the Technology section of this video. Illuminia bought Solexa in 2006 so the terms Illumina and Solexa are often used interchangeably.

Roche/454

454 machines were the first of the next-gen machines and can produce significantly longer base pair (bp) read lengths than the Illumina machines (~400 bp vs. ~100 bp). They are used by the Broad for projects where these extended reads are helpful such as de novo assembly of new fungal and microbial genomes. 454 machines use PicoTiterPlates rather than flow cells. Their chemistry is described here and the process is more fully described by some of their videos.

 

ABI/SOLiD

SOLiD machines are the third type of next-gen machines used at the Broad. The most distinguishing feature of these systems is their use of color space.

 

Third-gen Sequencing

The Broad is investigating new machines being developed by Helicos Biosciences, Pacific Biosciences, Oxford Nanopore Technologies, and Complete Genomics. Sequencing's new race is a good introduction to these systems.

 

Glossary of Sequencing Terms

Alignment

Reconstructing the genome of a sample organism by matching its sequenced DNA fragments to a reference sequence. When there is no reference, the sequenced fragments must be aligned to themselves and assembled into a new reference. This is called de novo sequencing and is much more challenging. The Human Genome Project took 13 years to develop the first reference sequence for humans.

Cluster station

Lab instrument used to replicate fragments of DNA that have been inserted into a flow cell in order to amplify the signal produced during sequencing. The cBot is a new type of cluster station.

Codon

Triplet of base letters (e.g. ACG) that creates a specific amino acid. A sequence of codons along a strand of DNA produces a sequence of amino acids which together form a protein. Certain codons are called stop codons and serve to indicate the end of the sequence. There is some redundancy in the coding as there are 64 possible triplets but only 20 possible amino acids.

Exome

The fraction of the entire genome which contains the protein-coding genes. For humans this is about 1% of the entire genome. Some sequencing projects just focus on the exome as opposed to the whole genome.

Flow cell

Microscope-slide sized glass plate with internal channels into which DNA samples are injected, clustered, and then sequenced.

Hybrid selection

A Broad-developed technique to select certain parts of the whole genome for sequencing by creating template targets that match a reference genome at known areas of interest in order to isolate and replicate specific subsets of the sample DNA. RNA capture probes that target the exome are used as the "bait" which is thrown into the "pond" of sample DNA. The resulting hybrid-selected DNA targets are the "catch" which is then extracted, enriched, and sequenced.

Indel

An insertion or deletion of one or more bases at a particular spot (locus) on a chromosome (e.g. 4 Gs instead of 5).

Jumping Library

A library constructed from shearing long fragments that have been circularized and joined with a marker molecule, allowing reads across the marker molecule to indicate a jump to another part of the chromosome that is thousands of bases away. Similar to a paired-end read.

Library construction (LC)

The creation of a collection of DNA fragments that represent all the chromosomal information of an organism. DNA is extracted from a tissue or blood sample, sheared into fragments with lengths appropriate for sequencing, enriched using PCR, normalized to the desired concentration, and denatured to produce single-stranded DNA (ssDNA).

Paired-end read

Sequencing the same strand of DNA from both ends so that the relative location of the two reads can be determined from the strand length. Even if the strand length is longer than the combined read lengths, knowing how far apart they are helps with alignment.

Polymerase chain reaction (PCR)

A technique used to amplify (copy) strands of DNA using complementary primers (oligonucleotides), DNA polymerase, and thermal cycling. PCR is used prior to sequencing to amplify the individual sheared strands, either on beads for the 454 and SOLiD processes (emulsion PCR) or in flow cells for the Illumina process (bridge PCR using a cluster station or cBot). qPCR, or quantitative PCR, uses the predictable exponential growth rate of PCR to calculate the initial concentration of DNA in a sample where the amount of starting material may be undetectable. This animation illustrates the process.

Single-nucleotide polymorphism (SNP)

A change in one base letter. A SNP whose resulting codon produces the same amino acid is called a silent or synonymous mutation and generally has no impact. A SNP that codes for a different amino acid is a missense mutation, while one that causes a premature stop is a nonsense mutation. Both missense mutations and nonsense mutations can have a significant impact on an organism. A catalog of known SNPs is stored in a public database called dbSNP hosted by NCBI and NHGRI.

Singleton

A variation that occurs in only one individual in a sample population. A doubleton occurs in two individuals.

Notable Genomic Projects

 

  • The Human Genome Project (1990-2003)

A massive U.S. effort to determine the sequence of the entire human genome. The results of the HGP were first published in 2003 and continue to be refined.

  • The International HapMap Project (2002-2009)

A collaboration of world scientists and private companies to develop a haplotype map of the human genome, the HapMap, which describes the common patterns of human DNA sequence variation. The HapMap is expected to be a key resource for researchers to use to find genes affecting health, disease, and responses to drugs and environmental factors.

  • The Cancer Genome Atlas (2005-present)

This NCI/NHGRI project (TCGA) aims to systematically explore the entire spectrum of genomic changes involved in human cancers and is currently studying more than 20 types of cancer.

  • The 1000 Genomes Project (2009-present)

A project to sequence the genomes of a large number of people from selected populations throughout the world in order to find the most common variants. Current plans call for the sequencing of about 2000 samples at 4X coverage.

  • Human Microbiome Project (2009-present)

This NIH project (HMP) is designed to characterize the multitude of microbes that live in the various environments of the human body. A major goal of the HMP is to look for correlations between changes in the microbiome and human health.

Agency Acronyms

 

ICGCInternational Cancer Genome Consortium
NCBINational Center for Biotechnology Information
NCINational Cancer Institute
NHGRINational Human Genome Research Institute
NHLBINational Heart, Lung, and Blood Institute
NIAIDNational Institute of Allergy and Infectious Diseases
NIHNational Institutes of Health
NIMHNational Institute of Mental Health
NLMNational Library of Medicine
  • No labels