In one of the first uses of genome sequencing to trace the path of a foodborne illness outbreak, a team led by scientists from Harvard School of Public Health (HSPH) and the Broad Institute looked at the E. coli O104:H4 epidemic that hit Europe last year.

Their study was published this week in Proceedings of the National Academy of Sciences (PNAS).

The group sequenced and compared samples from 7 people in France and 4 from Germany who were sickened in the outbreak, which killed more than 50 among the 4,000 confirmed infected. The source of the outbreak was traced to raw sprouts germinated from Egyptian fenugreek seeds.

In Germany, the contaminated sprouts came from an organic farm that supplied many restaurants and food service companies. In France, the sprouts were grown from seed sold by a garden retailer, and served at a community center event.

The researchers found that the E. coli O104 strains from both outbreaks appeared identical based on conventional molecular epidemiologic analysis. But using whole-genome sequencing and analysis, the researchers detected small but measurable differences among the isolates.

That led to two surprising findings: All the strains connected to the larger German outbreak were nearly identical, while the strains in France were more diverse; and the strains from German appeared to be a subset of the diversity seen in the French isolates.

Co-author William Hanage, associate professor of epidemiology at HSPH, said in a news release that the findings suggest a possible “bottleneck” in the outbreak, such as disinfection procedures that killed most but not all of the bugs, or maybe passage through a single infected individual.

Another hypothesis is that there was uneven distribution of diversity in the original shipment of contaminated seeds from Egypt.

“A genome contains the record of a strain’s evolutionary history, so by looking at the differences between the genomes of multiple bacteria from an outbreak we can get really useful clues about what happened in the outbreak. In this way, tracking outbreaks is like detective work, and this approach will be a powerful tool in trying to understand future outbreaks,” said lead author Yonatan Grad, a research fellow in the Center for Communicable Disease Dynamics, Department of Epidemiology at HSPH and infectious disease physician at Brigham and Women’s Hospital in Boston.

The news release noted that as costs for genomic sequencing decline, these tools, combined with traditional epidemiological techniques, can provide greater insight into the emergence and spread of infectious diseases and will help guide preventive public health measures.

  • Dan Cohen

    Here is my expansion of the possibilities the authors propose “genomic epidemiology” suggests from the SNP diversity:
    Scenario “A”: Seedlot UNIFORMLY contaminated with DIVERSE O104:H4 SNP-strains.
    Possibilities to explain the difference between France and Germany
    A.1 Random stochastic event is passage through a single human, who transmits restricted SNP substrain to cause German outbreak.
    A.2 Partially successful disinfection allows only the 2 SNPsubtypes to multiply in the German sprout facility.
    A.3 More sampling from patients in Germany will show greater SNP diversity.
    Scenario “B” Seedlot HETEROGENOUSLY contaminated with DIVERSE O104:H4 SNP-strains
    B.1 Batch to France sampled a more DIVERSE SNP-strain population (possibly due to heavier contamination in that region of the seed lot). “Requires low probability event” that the higher diversity E. coli population was in the smaller shipment to France, not in the larger shipment to Germany.
    [actually this seems quite likely to me, if the assumption is that there was a hotspot in the lot that unfortunately went to France, and lower contamination meant the probability of multiple strains was lower to Germany, then this fits well with seedlot division and is not so improbable…]
    Scenario “C” Seedlot was UNIFORMLY contaminated with RESTRICTED # of O104:H4 SNP-strains; followed by DIFFERENTIAL ACCUMULATION of diversity.
    C.1 Due to differences in sprouting conditions, German professional facility vs French grammar school students, well water, length of time sprouting, etc; IFF (if and only if) one or more of these conditions can be shown to increase SNP mutation rates (dramatically?)
    Scenario “D”, not in text but for completeness: seedlot HETEROGENOUSLY contaminated with RESTRICTED # O104:H4 SNP-strains
    Larger sample to Germany, more likely to be contaminated with just the 2 outbreak SNP-strains; then requires both an unfortunate second sampling of the seedlot goes to France AND a mechanism for diversity in accumulation. So “D” does not really help.
    I think “A.1” is a problem until an explanation is made for a pre-outbreak illness due to the seedlot contamination.
    And I actually think that “B.1” is not improbable and could make a lot of sense. Hotspots are why seedlots for planting seed have to go through such careful sampling procedures for plant pathology testing for seedborne plant diseases.
    I hope this helps!
    Dan Cohen (from my side of an email discussion with one of the authors)