Skip to main content

On the cusp?

I started covering the field of human genomics in depth around 2008. It was a time when whole genome sequencing (WGS) still cost millions of dollars per genome, and genome editing methods were so costly and difficult that they were rarely feasible. Nonetheless, it was a very exciting time, and the promise and potential of personalized genomic medicine seemed just around the corner. But medical professionals remained largely untouched by the so-called genomics revolution, and most of the energy and activity was concentrated in basic, not translational, research. So, despite the excitement, the practical impact for patients and physicians was far more theoretical than real, with relatively few exceptions.

Fast forward about 15 years to the first days of 2024 and, while it might not be obvious in some ways, the situation has changed a great deal. Human genomics still has considerable unrealized potential in the clinic, no doubt, but research findings now have many more medical implications. Hugely expanded research capabilities—CRISPR, low-cost WGS, accurate long-read sequencing, larger and more diverse genome databases and so on—are beginning to translate to more widespread patient benefit. The millions of genetic variants of unknown significance (VUS) are being interrogated and, little by little, classified as pathogenic, benign or something in between. The consequences of having structural variants (SVs) where the DNA bases may be the same but segments are inserted, deleted, inverted or duplicated are coming into clearer focus. The many challenges of gene therapy are being addressed through ever more powerful sequencing and genome editing methods (e.g., prime editing), as well as enhanced delivery options such as nanoparticle vehicles and safer, more efficient viral vectors. And so on.

I’ve said before that new capabilities in the laboratory portend a very exciting time for biomedical and clinical research. And while the translation of research discoveries to clinical applications still seems agonizingly slow, the field has, in fact, come a long way in a relatively short time. Some thoughts I had on the topic came into clearer focus when I attended the American Society for Human Genetics (ASHG) meeting in November 2023, where the talks showcased recent research progress in a variety of areas. More striking, the body of work also hinted at profound implications for medical care on the horizon.

Research into human genetics and genomics has grown in scope to the point where a comprehensive summary of progress would take encyclopedic coverage. The following are therefore just a few selected examples of the compelling and promising efforts underway; there are, of course, many, many more.

Editing the genome

Around the time I was making my first deep dives into the nascent field of human genomics, researchers were busy investigating an interesting bit of bacterial biology. It turns out that bacteria have a kind of rudimentary form of pathogen defense, in their case to ward off repeat attacks by viral phages. Stored in bacterial genomes, DNA sequences subsequently called clustered regularly interspaced short palindromic repeats—CRISPR for short—provided bacteria with the ability to recognize and, when paired with a protein that cleaves DNA, slice up viral genomes to ward off infection. Few outside of microbiology paid much attention for a while, but over time an idea emerged: if CRISPR could cut specific DNA sequences in bacteria with high efficiency and accuracy, might it work in other kinds of cells too? Sure enough it did. When it was shown to provide a relatively quick and easy way to target and cleave whatever sites researchers chose in mammalian genomes—including human—in 2013, the CRISPR revolution had begun.

CRISPR has absolutely transformed biomedical research in the decade since. Engineering mouse genomes to study what happens when you knock out, introduce or disable a gene now takes just a few months, rather than years of careful breeding and husbandry. If a particular genetic variant or mutation is implicated in a human disease, it can be accurately engineered and studied in organisms such as mice or in human cell lines with relative ease. This has greatly accelerated discovery and made what was once thought impossible not just possible but routine for researchers around the world. At the same time, the clinical implications of CRISPR-based therapies cannot be overlooked. If a disease-causing genetic mutation can be edited and fixed in enough cells, will that permanently cure the disease?

The answer to that question has not been straightforward for medical applications. First, the original CRISPR/Cas9 protocols involved physically cleaving both DNA strands with Cas9 (the protein, one of a type known as endonucleases, that does the actual cutting), which is quite risky if it happens in the wrong place or if something goes awry in the cut/repair process. Second, even if everything goes 100% right, actually delivering the CRISPR tools—guide RNA and associated molecules—to enough cells in a person to make a difference is very challenging. Nonetheless, Casgevy, a sickle-cell anemia therapy, was the first CRISPR-based gene-editing therapy to receive FDA approval on December 8, 2023. In effect, Casgevy sidesteps the above issues by being administered to a specific cell population outside of the body, which is then transfused back in. It also works by simply disabling, not correcting, a gene that inhibits the production of fetal hemoglobin during subsequent development, allowing it to be produced once more and, hopefully, leading to the production of normal instead of sickle-shaped red blood cells.

Researchers are hard at work on the obvious next step, which is to safely edit and repair a gene in vivo (inside the body). Recent refinements to CRISPR/Cas9, such as via a method known as prime editing, are making it possible to edit DNA without cutting both strands, which is safer and more reliable for clinical applications. The potential for the method is particularly exciting for rare genetic diseases for which there is no available cure. At The Jackson Laboratory, work is well underway to develop and apply prime editing-based therapies for rare neurological conditions: spinal muscular atrophy, Friedreich’s ataxia, Huntington’s disease and Rett syndrome. Supported by a substantial National Institutes of Health grant, the multi-institutional effort is led by JAX’s Cathleen Lutz, Ph.D., and one of the key collaborators is Harvard University’s David Liu, Ph.D., who created the prime editing method. The grant goes beyond preclinical laboratory research as well, the end goal being to advance at least one lead candidate therapy through a successful investigational new drug (IND) application. Achieving this goal is a huge step toward providing clinicians with an outright cure for one or more previously untreatable diseases.

Reducing uncertainty in the genome

The human genome is largely the same in everyone—99+% similar in fact. And most of the differences are part of normal variation, the subtle differences that determine our physical makeup and individuality. We all have genome variants that are not part of standard variability and function, however. If they disrupt or alter an essential gene, they may manifest as disorders and diseases, so-called rare monogenic or Mendelian diseases. But there are many more that affect disease risk or change gene–gene interactions or have unknown impacts, and determining how all the small changes add up within an individual represents one of the most important aspects of current genetic research.

An abstract illustration depicting an infographic of gathered DNA data.

While stating it so directly makes it sound like a relatively straightforward task, even a small percentage of changes in a genome that contains roughly 3.2 billion base pairs adds up to a massive number. And as genome-wide association studies (GWAS) show over and over again, most of the genetic variants that affect disease risk are in the vast regions of the genome that don’t actually code for proteins but instead help to regulate when and how much of a protein is produced. Interrogating each one is a truly Herculean effort, but also a vital one as genome sequences become more and more relevant to clinical care and disease outcomes. Fortunately, researchers are now able to measure the functional consequences of thousands of VUS simultaneously. Using methods such as multiplex assays of variant effect (known as MAVEs), in which sequence variants are synthesized and introduced into yeast or human cells, which are then cultured and measured for effect, researchers can greatly broaden and accelerate VUS analyses.

A single protein-coding gene can have hundreds or thousands of VUS that may contribute to disease or somehow alter the protein product or both. Accurately classifying variants as pathogenic, potentially pathogenic or benign in genes that contribute prominently to disease risk (e.g., BRCA1 and TP53 for cancer) is therefore particularly important. And the problem becomes much larger in the noncoding regions where the effects are less direct but can also contribute significantly to disease. MAVEs enable investigating thousands of genetic variants in a single experiment, helping to address the backlog of VUS that has accumulated. When I had my genome sequenced and data provided in early 2016, my report included six variants of clinical significance, 35 with a possible clinical association and 2,776 VUS. I expect that if I get my data reanalyzed—I should probably do that, come to think of it—the number of VUS will be much lower, and MAVEs will have contributed a lot to that.

Sequencing the genome

Last summer, there was big news for human genome sequencing: the Y chromosome was fully sequenced for the first time. While nearly 10% of the genome is difficult to sequence using short-read methods, the Y chromosome has proven to be particularly opaque, and just over half of it is included in the current reference sequence. Its large regions of highly repetitive sequences make it impossible to break DNA into small (~250 base pair) sequences, as short-read methods require, then sequence them and reassemble them with any accuracy at all. Using increasingly accurate long-read methods and analysis pipelines that can cover millions of base pairs at a time, the telomere-to-telomere (T2T) consortium finally published a complete Y chromosome sequence in Nature with all the gaps filled in. But that was only part of the story.

A group led by JAX’s Charles Lee, Ph.D., FACMG, took it a step further and generated 43 complete human Y chromosome sequences obtained from a variety of individuals from around the world. Also published in Nature, their paper revealed a very important finding: a single Y chromosome reference sequence, such as that generated by T2T, is inadequate. In fact, it’s astonishing how different those 43 sequences are. First, they differ drastically in sheer size, varying between 45.2 million and 84.9 million base pairs. Also, their gene-rich regions contain large SVs called inversions, where the same nucleotide sequences are oriented in the opposite direction, at a much higher rate than anywhere else in the genome. There was also high gene copy number variation (where the number of copies of any given gene differs) in certain gene families. Given that the profound variation was previously unrecognized, the functional effects of it remain unexplored, but the researchers anticipate that it is likely to contribute directly to men’s health and disease.

Of course, the Y chromosome isn’t the only part of the genome that benefits significantly from accurate long-read sequencing. SVs of all kinds—insertions, deletions, duplications, as well as the inversions and copy number variation mentioned previously—are very difficult to characterize with short-read methods. So, despite the fact that they affect more base pairs in a genome than the single nucleotide polymorphisms (SNPs) that are readily detected, SVs’ contributions to function, evolution and disease have not been well characterized. Including them in WGS analyses and interpretation has important implications for clinical impact. For example, rare disease diagnostics have received a large boost from exome (protein coding region only) and short-read WGS, yielding diagnoses for roughly 25%–30% of patients who previously lacked a diagnosis. But that obviously leaves 70%–75% of patients without answers. Implementing clinical long-read sequencing has strong potential to further increase the number of accurate diagnoses, to the benefit of patients and their families.