News Release

New research further translates the language of the genome

Peer-Reviewed Publication

Wellcome Trust Sanger Institute

New research has uncovered more about the complexity of human gene regulation by identifying certain sequences of proteins called transcription factors that bind to DNA and regulate the expression of human genes.  

Published today (9 April) in Nature, researchers from the Wellcome Sanger Institute, the University of Cambridge and their collaborators explored how DNA-guided transcription factors interact with each other.

This research adds to the groundwork of understanding the complex language of the gene regulatory code, and how DNA sequence patterns located close to our genes influence human development and disease risk.

Each gene has a regulatory region that contains instructions on when and where the gene is expressed. This information is written in a code that is read by transcription factors, which bind to specific DNA sequences and either increase or decrease the gene's expression.

Previous research has explored the ‘language’ of the genome — the regulatory code that controls gene expression. It found that cooperation between multiple transcription factors is a key feature of transcription factor-DNA binding, with DNA actively facilitating interactions between various transcription factors.1 With the regulatory code being far more complex than the genetic code, which explains how DNA sequence determines the structure of proteins, researchers are aiming to understand the regulatory language in more detail, focusing on the ‘words’ and ‘grammar’ — such as the transcription factors — that influence when and where genes are expressed.

This deeper understanding is crucial for uncovering how cells develop into specific types, how organs form and where they are located in the body during embryonic development, and for understanding what goes wrong in disease.

The interactions between transcription factors guided by DNA are poorly understood. In a new study, researchers from the Sanger Institute and the University of Cambridge used two novel algorithms to analyse 58,000 pairs of transcription factors from human cells. They did this to identify how and where transcription factors interact with each other to bolster their understanding of the genomic language.2

The researchers' results reveal new patterns and preferences in how certain transcription factors interact with each other – also known as ‘motifs’. In this study, the researchers estimate that they identified between 18 and 47 per cent of all human transcription factor pair motifs, greatly adding to their understanding of the regulatory code.

The team found that certain motifs they identified are present in developmental enhancers – DNA regulatory elements that activate transcription of a gene – that control important stages such as development of fingers.  For example, the research notes that certain sequences of transcription factor motifs, or ‘words’ in the language, influence whether or not someone develops polydactyly – too many fingers – or syndactyly – a fusion of fingers.

The findings also have implications for how scientists will use computational models – such as artificial intelligence – to predict protein structures in the future. Whilst these tools can predict the overall structure, they often cannot look into smaller details, such as how transcription factors interact with each other on DNA. These small interactions can have a big impact on human development, but computational models cannot always predict this. The researchers hope that future models will be able to incorporate the more minute transcription factor details to better predict protein structure and protein-DNA interactions.

This research marks a step forward in studying the smaller ‘words’ in the language of gene expression. By identifying small but key motifs in the genome, this research will help scientists understand and interpret the mechanisms influenced by transcription factors, particularly in the non-coding regions of the genome. These regions – which make up 99 per cent of the genome – do not code for proteins but still play a significant role in regulation of gene expression, and risk for development of disease.

Dr Ilya Sokolov, an author of the study at the Wellcome Sanger Institute, said: “By gaining a deeper understanding of how transcription factors interact when guided by DNA, we hope our research will shed light on the molecular basis of the regulatory code, particularly in the context of developmental disorders. These interactions are evolutionarily conserved across mammals and offer several advantages in development, from incorporating positional information to creating sharper gene expression responses. With advanced insights into the regulatory code, we are excited to help drive future research that will improve our understanding of human development and developmental disorders.”

Professor Jussi Taipale, senior author of the study and Group Leader at the Wellcome Sanger Institute, said: “The human genome’s regulatory code is very complex, far more complex than the genetic code, and this research into transcription factor interactions unlocks deeper insights into the ‘language’ of the genome. Not only does our study provide more information into patterns of human development but it paves the way for future work with computational models that can hopefully incorporate these new data to better understand gene regulation.”

 

ENDS

Contact details:
Susannah Young

Press Office
Wellcome Sanger Institute
Cambridge, CB10 1SA

07907391759
Email: press.office@sanger.ac.uk

 

Notes to Editors:

  1. Arttu Jolma, Yimeng Yin, Kazuhiro R. Nitta, Kashyap Dave, Alexander Popov, Minna Taipale, Martin Enge, Teemu Kivioja, Ekaterina Morgunova, Jussi Taipale. (2015) ‘DNA-dependent formation of transcription factor pairs alters their binding specificity.’ Nature. DOI: 10.1038/nature15518
  2. The researchers expressed a set of human transcription factors — enriched in proteins that  are conserved in mammals  — in Escherichia coli, combined them into a total of 58,754  transcription factor (TF) pairs and analysed their interactions by CAP-SELEX - consecutive affinity purification evolution of ligands by exponential enrichment. CAP-SELEX is a method which enables the discovery of TF-TF-DNA binding preferences.

 

Publication:
Zhiyuan Xie et al. (2025) ‘DNA-guided transcription factor interactions extend human gene regulatory code.’ Nature. DOI: 10.1038/s41586-025-08844-z

 

Funding:
The research was part funded by Wellcome and the University of Cambridge. A full list of funders and acknowledgements can be found in the publication.

Selected websites:

University of Cambridge

The University of Cambridge is one of the world’s top ten leading universities, with a rich history of radical thinking dating back to 1209. Its mission is to contribute to society through the pursuit of education, learning and research at the highest international levels of excellence.

The University comprises 31 autonomous Colleges and 150 departments, faculties and institutions. Its 24,450 student body includes more than 9,000 international students from 147 countries. In 2020, 70.6% of its new undergraduate students were from state schools and 21.6% from economically disadvantaged areas.

Cambridge research spans almost every discipline, from science, technology, engineering and medicine through to the arts, humanities and social sciences, with multi-disciplinary teams working to address major global challenges. Its researchers provide academic leadership, develop strategic partnerships and collaborate with colleagues worldwide.

The University sits at the heart of the ‘Cambridge cluster’, in which more than 5,300 knowledge-intensive firms employ more than 67,000 people and generate £18 billion in turnover. Cambridge has the highest number of patent applications per 100,000 residents in the UK. www.cam.ac.uk

 

The Wellcome Sanger Institute

The Wellcome Sanger Institute is a world leader in genomics research. We apply and explore genomic technologies at scale to advance understanding of biology and improve health. Making discoveries not easily made elsewhere, our research delivers insights across health, disease, evolution and pathogen biology. We are open and collaborative; our data, results, tools, technologies and training are freely shared across the globe to advance science.

Funded by Wellcome, we have the freedom to think long-term and push the boundaries of genomics. We take on the challenges of applying our research to the real world, where we aim to bring benefit to people and society.

Find out more at www.sanger.ac.uk or follow us on Twitter, Instagram, FacebookLinkedIn and on our Blog.

About Wellcome

Wellcome supports science to solve the urgent health challenges facing everyone. We support discovery research into life, health and wellbeing, and we’re taking on three worldwide health challenges: mental health, infectious disease and climate and health. https://wellcome.org/

 

 


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.