New Comprehensive Map Ties Every Human Gene to Its Function

Every expressed gene in the human genome was sequenced using the single-cell sequencing technology Perturb-seq, which linked each gene to its function in the cell.

Over the previous few decades, genetics research has progressed quickly. Scientists reported the first full, gap-free human genome sequencing just a few months ago, for example. Now, researchers have made another breakthrough, establishing the first comprehensive functional map of human cell genes.

The Human Genome Project was a huge undertaking that aimed to sequence every single human strand of DNA. The project, which included participants from throughout the world, including MIT's Whitehead Institute for Biomedical Research, was finished in 2003. Professor Jonathan Weissman of MIT and colleagues have gone beyond the sequencing to publish the first comprehensive functional map of genes expressed in human cells, more than two decades later. The findings from this effort, which were published online in the journal Cell on June 9, 2022, link each gene to its function in the cell and are the result of years of collaboration on the single-cell sequencing technology Perturb-seq.

Other scientists can access the information.  “It’s a big resource in the way the human genome is a big resource, in that you can go in and do discovery-based research,” says Weissman, who is also a Whitehead Institute member and a Howard Hughes Medical Institute scientist. “Rather than defining ahead of time what biology you’re going to be looking at, you have this map of the genotype-phenotype relationships and you can go in and screen the database without having to do any experiments.” 

The screen enables the researchers to investigate a wide range of biological issues. They utilized it to analyze the cellular impacts of genes with unclear roles, the response of mitochondria to stress, and to look for genes that induce chromosomal loss or gain, a characteristic that has hitherto been difficult to study.  “I think this dataset is going to enable all sorts of analyses that we haven’t even thought up yet by people who come from other parts of biology, and suddenly they just have this available to draw on,” says Tom Norman, a former postdoc at the Weissman Lab who is a co-senior author of the research.

Pioneering Perturb-seq

The Perturb-seq technique is used in this investigation, which allows researchers to track the impact of turning genes on or off in unprecedented detail. This approach, which was initially described in 2016 by a group of researchers led by Weissman and fellow MIT professor Aviv Regev, could only be employed on a small number of genes and at a high cost.

Joseph Replogle, an MD-PhD student in Weissman's group and co-first author of the current research, laid the groundwork for the huge Perturb-seq map. Replogle set out to create a new version of Perturb-seq that could be scaled up in collaboration with Norman, who now leads a lab at Memorial Sloan Kettering Cancer Center; Britt Adamson, an assistant professor in the Department of Molecular Biology at Princeton University; and a group at 10x Genomics. In 2020, the researchers released a proof-of-concept study in the journal Nature Biotechnology.

The Perturb-seq approach use CRISPR-Cas9 genome editing to introduce genetic modifications into cells, followed by single-cell RNA sequencing to gather data on the RNAs that are expressed as a result of the genetic mutation. This technique can assist decipher the various cellular impacts of genetic alterations since RNAs affect all elements of how cells behave.

Weissman, Regev, and others have employed this sequencing approach on smaller sizes since their first proof-of-concept publication. In 2021, for example, the researchers utilized Perturb-seq to investigate how human and viral genes interact during an infection with the common herpesvirus HCMV.

Replogle and collaborators, including Reuben Saunders, a graduate student in Weissman's lab and the paper's co-first author, scaled up the strategy to the full genome in the latest work. He ran Perturb-seq across more than 2.5 million cells using human blood cancer cell lines as well as noncancerous cells obtained from the retina, and utilized the data to create a complete map relating genotypes to phenotypes.

Delving into the data

The researchers decided to utilize their new dataset to investigate a few biological questions after finishing the screen. “The advantage of Perturb-seq is it lets you get a big dataset in an unbiased way,” Tom Norman adds. “No one knows entirely what the limits are of what you can get out of that kind of dataset. Now, the question is, what do you actually do with it?” 

The most straightforward use was to investigate genes with unknown functions. The researchers could compare unknown genes to known genes and seek for comparable transcriptional outputs, which may show the gene products operated together as part of a bigger complex, because the screen picked out phenotypes of many known genes.

One gene, C7orf26, in particular, was found to be mutated. Researchers discovered that the genes that caused the same phenotype were part of a protein complex called Integrator, which was involved in the creation of tiny nuclear RNAs. The Integrator complex is made up of a number of smaller subunits — prior studies revealed 14 different proteins — and the researchers were able to establish that C7orf26 was one of them.

They also observed that inside the Integrator complex, the 15 subunits collaborated in smaller modules to conduct particular roles. “Absent this thousand-foot-high view of the situation, it was not so clear that these different modules were so functionally distinct,” Saunders adds.

Another advantage of Perturb-seq is that, because the assay concentrates on single cells, researchers may utilize the data to investigate more complicated phenotypes that might get muddled when analyzed with data from other cells. “We often take all the cells where ‘gene X’ is knocked down and average them together to look at how they changed,” Weissman explains. “But sometimes when you knock down a gene, different cells that are losing that same gene behave differently, and that behavior may be missed by the average.” 

The researchers discovered that chromosomal segregation was caused by a handful of genes whose removal resulted in diverse results from cell to cell. Their removal caused cells to lose or gain an additional chromosome, a phenomenon known as aneuploidy.  “You couldn’t predict what the transcriptional response to losing this gene was because it depended on the secondary effect of what chromosome you gained or lost,” Weissman says. 

“We realized we could then turn this around and create this composite phenotype looking for signatures of chromosomes being gained and lost. In this way, we’ve done the first genome-wide screen for factors that are required for the correct segregation of DNA.”

“I think the aneuploidy study is the most interesting application of this data so far,” Norman adds. “It captures a phenotype that you can only get using a single-cell readout. You can’t go after it any other way.” 

The researchers also looked at how mitochondria reacted to stress using their data. Mitochondria have 13 genes in their genomes, which developed from free-living bacteria. Around 1,000 genes in nuclear DNA are linked to mitochondrial activity in some way.   “People have been interested for a long time in how nuclear and mitochondrial DNA are coordinated and regulated in different cellular conditions, especially when a cell is stressed,” Replogle adds.

The researchers discovered that when distinct mitochondria-related genes were disturbed, the nuclear genome responded similarly to a variety of genetic alterations. The mitochondrial genome responses, on the other hand, were far more varied.

“There’s still an open question of why mitochondria still have their own DNA,” Replogle remarked. “A big-picture takeaway from our work is that one benefit of having a separate mitochondrial genome might be having localized or very specific genetic regulation in response to different stressors.” 

“If you have one mitochondria that’s broken, and another one that is broken in a different way, those mitochondria could be responding differentially,” Weissman says.

The researchers intend to apply Perturb-seq on additional types of cells in the future, in addition to the cancer cell line they started with. They also intend to expand on their gene function map and encourage others to do the same.   “This really is the culmination of many years of work by the authors and other collaborators, and I’m really pleased to see it continue to succeed and expand,” Norman adds.