Researchers have developed a novel genome assembly tool that could spur the development of new treatments for tuberculosis and other bacterial infections.

The new tool, which has created an improved genome map of one tuberculosis strain, should do likewise for other strains and other types of bacteria, according to researchers whose findings appeared in Nature Communications.

Mycobacterium tuberculosis, the bacteria responsible for the disease tuberculosis, infects about a quarter of the world’s population and killed 1.6 million people in 2021, according to World Health Organization. Current medical interventions are limited to a century-old vaccine that reduces infection risk by 20 percent and four to six months of strong antibiotics that sometimes prove ineffective.

“The key to beating this disease is to understand it, and the key to understanding it lies in its DNA,” said David Alland, the senior author of the study who is chief of the Division of Infectious Diseases at Rutgers New Jersey Medical School and director of the school’s Public Health Research Institute. “We hope our new pipeline provides researchers around the world with the information they need to create faster, more effective treatments and, ideally, a fully effective vaccine.”

Scientists first sequenced the genome of one tuberculosis strain H37Rv in 1998, but they never could generate the sort of complete and accurate sequence that would maximize their chances of eradicating the disease — until now.

The new pipeline, dubbed Bact-Builder, combines common open-source genome assembly programs into a novel and easy-to-use tool which is freely available on GitHub.

Scientists today typically sequence new bacterial genomes by cutting large pieces of DNA into small, quick-to-scan fragments and then using a reference sequence such as H37Rv to align all the resulting pieces of data properly. However, assembling genomes without a reference, as Bact-Builder does with data from MinION sequencers, allows researchers to identify genes present in clinical strains that may not be present in the reference.

The tuberculosis sequence created by Bact-Builder contains approximately 6,400 thousand more pieces of information (base pairs) than the old reference and, more importantly, identifies gene new genes and gene fragments missing in the old reference.

“Just publishing a fully accurate genome for the H37Rv reference strain, which is used in hundreds of studies a year, should significantly help tuberculosis research," Alland said.

Having an easy way to sequence all strains accurately is even more important, Alland said, “because strain comparison should answer many vital questions such as why some strains are more contagious than others. Why do some strains cause more serious disease? Why are some strains more difficult to cure? The answers to all these questions, which could help us devise better treatments and vaccines, are in the genetic code, but you need an accurate way to find them.”