It is seven in the morning and your alarm is chirping away, or maybe it’s the morning sun, or the screech of a parent pressed for time. Regardless of the method from which you awake from your slumber, you crawl out of bed and feel that familiar morning growl emanating from your hungry gut. Some might grab a snack and quickly rush out the door, others might ruminate over an elaborate breakfast, and some are content with a simple bowl of oatmeal. This often first and necessary act of the day is repeated by billions of humans around the globe and should serve to remind us all that we are nothing without food. However, implicit in this realization is another: that there can be no food without that life-giving stuff beneath our feet – the soil.

Soil silently performs a multitude of services critical for life-sustaining ecosystem functions. It controls the cycling of energy (carbon) and nutrients (nitrogen, phosphorus, potassium, sulphur etc.) that sustain the food and fibre we depend on for nourishment and materials. Soils filter our drinking water and with careful management, can fix carbon dioxide and reduce other greenhouse gases to mitigate the effects of climate change. Soil is teeming with life – just one teaspoon of fertile soil can contain 9 billion microbes, larger than the sum of all humans on this planet (Doran et al. 1999). These microbes are largely responsible for the biochemical transformations that provide the above ecosystem functions. In a sense, we can’t live without these microbes, and for this reason, understanding who they are and what they are doing can help us in nearly limitless ways, from increasing crop yields to finding novel enzymes to better understanding climate change.

The challenge, however, remains in how exactly do we study and characterize who these microbes are and what they are doing? Most soil microbes are invisible to the naked eye and less than 1% of microbial life has been cultured in the lab (Torsvik and Øvreås, 2002). For many years soil scientists referred to the soil microbial community as a ‘black box,’ meaning reliable measures of soil processes, such as soil nutrient concentrations, were easily obtained, but how the microbial community adapts to and influences these processes remained a mystery. The soil scientist’s toolbox was not equipped to measure who is out there, much less what they are doing. Today, that black box is beginning to be pried open largely due to advances in human genetics that paved the way forward for microbiologists of all stripes and colours.

The driving force behind the current revolution in understanding soil microbial communities has its roots in the Human Genome Project. A genome is the total complement of DNA from a single organism; it contains all of the genes that produce all of the proteins that make an organism what it is (Clark et al. 2009). The Human Genome Project was an unprecedented investment in genetic research – it cost U.S. taxpayers $2.7 billion but enabled scientists to sequence the human genetic code in its entirety by 2003 (NHGRI 2010). A sequenced genome is an invaluable resource, especially for human health, as mutations within the genetic code that lead to disease can be better understood, potentially leading to alternative medical treatments. The technology used to sequence the human genome is known as Sanger sequencing, a method that became the rate limiting step in sequencing the 3 billion base pairs in the human genome (Mardis 2011). The massive investment in the human genome project, coupled with the drive to increase the speed of DNA sequencing, led to new developments in sequencing technology, known today as ‘next generation sequencing.’

Decoding soil microbial genomes is an important step in understanding the microbial ‘black box.’ From knowledge of the genetic code of a community of soil organisms comes inference of the capabilities of that community. For example, scientists may expect soil samples with a diverse complement of genes that code for enzymes involved in decomposition to more quickly decompose inputs of organic matter such as fallen leaves. The problem though is that DNA extracted from soil does not contain DNA from only one type of microbe; one estimate pegs the diversity of prokaryotes (unicellular microbes without a nucleus) at 52 000 unique species in one gram of soil (Roesch et al. 2007). The average genome size of prokaryotic cells is 2 million base pairs, which would require that 104 billion base pairs be sequenced for a single gram of soil (Gilbert and Dupont 2011). In contrast, the well-funded human genome project took 13 years to complete with only 3 billion base pairs sequenced. It is precisely for this reason that next generation sequencing technology has allowed such advancements in the study of soil microbial communities.

The first high-throughput second generation sequencing platform was introduced in 2005 as the Roche 454 pyrosequencer (MacLean et al 2009). Other platforms are available and in some ways are more widely used today (the Illumina set up for example), but a simplified outline of the method for the 454 sequencing technology is highlighted here:

DNA extracted from a soil sample is first ligated (attached) to universal adapters. The DNA-adapter complex is then immobilized onto reaction beads before sequencing begins. Each bead now contains a unique fragment of extracted DNA, and thousands of beads, each with a different DNA fragment, are loaded onto a plate with thousands of wells where the individual beads reside (MacLean et al. 2009, Mardis 2008). In this way, each well houses a bead that itself is carrying a unique piece of extracted DNA. The soil DNA attached to the beads is single stranded, meaning nucleotide bases (G, A, T and C) remain unpaired and will readily bind with their mate to form a base pair. Sequencing begins by flooding the plate with a single nucleotide. If a fragment in any well has an unpaired adenine (A) that is next in sequence, and thymine (T) is flooded into the plate, an A-T bond will form. The nucleotides that are flooded onto the plate are modified such that when bound to their mate, the reaction triggers the activity of luciferase, the same enzyme responsible for the light emitted from fireflies (Mardis 2008). The activity of luciferase emits light that is detected by a camera. The plate is then washed of nucleotides, another different nucleotide is introduced, light is emitted when the added nucleotide finds its mate and the camera snaps another picture. Every well that has a DNA fragment with an unpaired nucleotide this is the made of the nucleotide being flooded will light up, and the light will be captured by the camera from multiple wells at the same time. In this way, thousands of DNA fragments can be simultaneously sequenced, driving down both the time and cost of sequencing. This is a powerful technique – had this technology been around when the human genome was being sequenced, it would have sequenced the entire human genome in 10 days or less (Nyrén 2007).

With advanced sequencing technologies in their toolkit, soil scientists (who might now call themselves soil microbiologists) can get down to the business of figuring out who is beneath our feet, and what they are doing. Easier said then done. The reality is that genomic DNA extracted from soils can come from hundreds of thousands of uncultured, never characterized, mystery microbes. The DNA extraction procedure produces millions of DNA fragments in a ‘shotgun’ approach such that bits and pieces of the entire soil microbial community are sequenced to produce a profile of all genes distributed throughout the community (Gilbert and Dupont, 2011). This method of study is in contrast to genomic studies where the genome of only one organism is considered. Handelsman et al. (1998) in their study of soil microbes from environmental DNA coined the term ‘metagenomics’ to describe this new branch of genomic research.

Metagenomic studies of soil DNA produce vast quantities of discontinuous sequence data. Post-sequencing data processing involves stitching together the fragments of DNA into something that resembles partial microbial genes. This work is computationally difficult – it is as if you fired a shotgun close-range at a painting and then attempted to re-construct the painting by piecing together all the obliterated bits of canvass without knowing what the painting looked like to begin with. In fact, an entire field – bioinformatics – is devoted to developing algorithms for processing metagenomic sequence data. The first step in data processing is to attempt to stitch together (assemble) the sequence data to form longer DNA fragments (Wooley et al. 2010). Longer fragments are easier to characterize and these fragments are ‘binned’ to assign the DNA fragments to a known microbial species, usually by comparing sample DNA to reference databases using tools such as the Basic Local Alignment Search Tool (BLAST). One downfall is most of the sequences in a metagenomic dataset will remain unassigned because most of the reference database is derived from well-characterized, cultured organisms (Simon and Daniel, 2010). Overall, processing sequence data is the last step when trying to piece together the structure and function of a soil microbial community. Although reference databases are incomplete, they do assign some collected fragments to known microbial species that have well characterized genes with known function. Less than 10 years ago this was a largely impossible or at least a prohibitively expensive task. Advances in sequencing technology, bioinformatics and reference databases will only improve our resolution over time.

With the advances afforded to the study of soil microbial communities through second generation sequencing technology, soil scientists have only just begun pry open the microbial black box. Currently, research is largely focused on microbial decomposition of plant inputs into soil due first to the search for novel enzymes for the biofuels industry, and second, because microbial decomposition has important consequences for global CO2 emissions (Baldrain 2014). The field is also rapidly changing, with emphasis increasingly on characterizing gene products (RNA, proteins) over the metagenome so as to capture the genes and proteins that are active in the soil under various environmental conditions. The rapid advances in sequencing technology over the last 10 years make this an exciting time for soil science research. These methods may one day prove to reveal the secrets held within one of the last frontier in modern science – the soil beneath our feet.


Baldrian, P., & López-Mondéjar, R. (2014). Microbial genomics, transcriptomics and proteomics: new discoveries in decomposition research using complementary methods. Applied Microbiology and Biotechnology, 1-7.

Clark, D. P., Dunlap, P. V., Madigan, M. T., & Martinko, J. M. (2009). Brock Biology of Microorganisms.

Doran, J. W., Jones, A. J., Arshad, M. A., & Gilley, J. E. (1998). 2 Determinants of Soil Quality and Health. Soil quality and Soil Erosion, 17.

Gilbert, J. A., & Dupont, C. L. (2011). Microbial metagenomics: beyond the genome. Annual Review of Marine Science, 3, 347-371.

Handelsman, J., Rondon, M. R., Brady, S. F., Clardy, J., & Goodman, R. M. (1998). Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chemistry & Biology, 5(10), R245-R249.

MacLean, D., Jones, J. D., & Studholme, D. J. (2009). Application of’next-generation’sequencing technologies to microbial genetics. Nature Reviews Microbiology, 7(4), 287-296.

Mardis, E. R. (2008). Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet., 9, 387-402.

Mardis, E. R. (2011). A decade’s perspective on DNA sequencing technology. Nature, 470(7333), 198-203.

National Human genome Research Institute. “The Human Genome Project Frequently Asked Questions.” (accessed Feb 12 2014).

Nyrén, P. (2007). The History of Pyrosequencing®. In Pyrosequencing® Protocols (pp. 1-13). Humana Press

Roesch, L. F. W., Roberta R. F., Alberto R., George, C., Hadwin, A.K.M., Kent, A.D., Daroub, S.H., Camargo, F.A.O., Farmerie, W.G., and Triplett, E.W. “Pyrosequencing enumerates and contrasts soil microbial diversity.” The ISME journal 1, no. 4 (2007): 283-290.

Simon, C., & Daniel, R. (2011). Metagenomic analyses: past and future trends. Applied and Environmental Microbiology, 77(4), 1153-1161.

Torsvik, V., & Øvreås, L. (2002). Microbial diversity and function in soil: from genes to ecosystems. Current Opinion in Microbiology, 5(3), 240-245.

Wooley, J. C., Godzik, A., & Friedberg, I. (2010). A primer on metagenomics.PLoS Computational Biology, 6(2), e1000667.