News Release

Largest ever DNA sequencing dataset on UK child development studies available

Peer-Reviewed Publication

Wellcome Trust Sanger Institute

Image of genetic sequencing machines

image: 

Genetic sequencing machine at the Wellcome Sanger Institute 

view more 

Credit: David Levene / Wellcome Sanger Institute

The first resource containing high-resolution DNA sequencing data for over 37,000 children and parents collected over multiple decades from across the UK is now available to researchers worldwide.  

The data release is led by the Wellcome Sanger Institute, the Children of the 90s study (also known as ALSPAC), the Millennium Cohort Study (MCS), and Born in Bradford (BiB)1, and supported by the Medical Research Council (MRC) and the Economic and Social Research Council (ESRC).

This work is supported by the ongoing efforts of Population Research UK, a UK-wide initiative led by teams at the University of Bristol and University College London, which aids longitudinal population studies by working to coordinate and connect the current research landscape.

Now available on the European Genome-phenome Archive (EGA), these high-quality genomic data can be used in combination with the existing longitudinal health and survey information provided by participating families. These combined data resources offer the scientific community the opportunity to make valuable insights in areas ranging from population genetics to the social sciences.

For example, it could be used to investigate the impact of genetic variation on neurodevelopmental conditions or childhood obesity, and how these are influenced by environmental factors.

Longitudinal research follows large numbers of participants over multiple years, repeatedly examining them at regular time points through, for example, blood tests, body measurements, and health questionnaires, to detect changes over time.

Previously, large DNA sequence datasets have typically focused on children with rare conditions or adult population cohorts. This new data release focuses on sequencing ‘birth cohorts’, which are population-based cohorts of people followed from birth through to adolescence or early adulthood.

To produce this latest data release, researchers at the Sanger Institute sequenced all 20,000 genes in the human genome, known as exome sequencing2, in samples from 8,436 children and 3,215 parents from the Children of the 90s study, 7,667 children and 6,925 parents from the MCS, and 8,784 children and 2,875 parents from BiB.

These three UK longitudinal birth cohort studies are internationally recognised and data from these cohorts have already been used to study the contribution of common genetic variants on phenotypes ranging from childhood obesity3,4 to parental nurturing behaviours5 and anxiety and depression6.

For example, by using Children of the 90s data, researchers found that a genetic variant in a gene called MC4R is associated with increased weight across childhood4 and studies like this could help design effective weight management interventions and change the way society views obesity7. That specific study used targeted DNA sequencing of the MC4R gene, whereas the new exome sequencing data reported here will allow similar investigations of other genes in the human genome. This will help drive more discoveries and research that could benefit human health.

The team has made the anonymised data as accessible as possible to approved researchers, including drafting a data note (available on Wellcome Open Research8) and other materials to help support its use by those who are less familiar with large-scale sequencing data.  

In coming months, this DNA sequence data resource will be expanded to encompass all participants in these cohorts as well as additional cohorts. The value of these data will be enhanced by harmonising the data across the different cohorts, providing a more powerful resource than could be achieved by one study in isolation.

Dr Carl Anderson, Interim Head of Human Genetics at the Wellcome Sanger Institute, said: “Longitudinal population studies from the UK have already had a huge impact on biomedical research worldwide. This significant addition of whole exome sequencing data will further transform our understanding of the development of complex traits and diseases across the life course.”

Dr Richard Evans, Interim Head of Population Health Sciences at the Medical Research Council, said: “The UK's cohorts and longitudinal population studies are an extraordinary national asset, made possible by the participation of a diverse range of people. The rich data and samples from these studies, when combined with whole exome sequencing, can unlock new research questions and insights into human society, development, health and ageing. MRC’s funding is part of our overall investment in understanding the drivers of disease to enable precision prevention and personalised treatments, and maximising existing infrastructure to ensure real value for money. This work aligns perfectly with a new exciting national resource that is supported by MRC and ESRC, Population Research UK, which is all about coordinating and leveraging UK cohorts.”

Professor Nicholas Timpson, Co-Director of Population Research UK and Principal Investigator of the Children of the 90s study at the University of Bristol, said: “The success of this initiative shows that coordination across cohort studies can be incredibly powerful and I’m excited to see the research that will come out of this fantastic new genetic data resource. We hope that this encourages other researchers to conduct long-term research studies, and solidifies their importance in UK and global research.”

Dr Hilary Martin, Group Leader at the Wellcome Sanger Institute, said: “This represents one of the largest sequenced datasets collected at birth from the general population, creating a unique resource for the research community with multiple uses. By combining the pre-existing health and lifestyle data from the initial studies with genomic information, we have started to get a more complete view of how small DNA changes can subtly influence certain complex traits across childhood and adolescence. In the future, we plan to continue to sequence samples in other longitudinal studies, to build an evolving and high-quality resource that can benefit researchers in multiple areas of the biological and social sciences.”

Professor Matthew Hurles, Director of the Wellcome Sanger Institute, said: “Great science is built on collaboration and this release would not have been possible without the engagement of the families themselves, the hard work of teams managing these longitudinal studies, sustained investment in these cohorts, especially from Wellcome and the Medical Research Council, the sequencing and data analysis power of the Wellcome Sanger Institute, and the support of Population Research UK. We aim to continue to build on this resource and provide high-quality, accessible genomic data for researchers worldwide. This initiative further exemplifies the vast potential of bringing together the UK’s life science assets including committed research participants, researchers, governmental and charitable funding agencies, and genomic and computational capabilities.”

ENDS

Notes to Editors:

The data are available to approved researchers worldwide, via the European Genome-phenome Archive (EGA). To access, please visit https://ega-archive.org/.  The EGA study accession numbers are:

  • ALSPAC (study: EGAS00001005273): dataset EGAD00001015371,
  • MCS (study: EGAS00001007789): dataset EGAD00001015372
  • BiB (study: EGAS00001006978): dataset EGAD00001015370

 

  1. All three internationally-recognised long-term studies have followed the participants and their families for years, and include participants from across the UK. The ALSPAC study, based at the University of Bristol, has followed participants for over three decades. The MCS at University College London includes people born across England, Scotland, Wales and Northern Ireland and began in 2000. The BiB study focuses on those living in Bradford and started tracking the health and well-being of children and their parents in 2007.
  2. Exome sequencing, also known as whole exome sequencing, is a technique that sequences all the protein-coding regions of genes in a genome. In total, all of the protein-coding regions, known collectively as the exome, represent less than 2 per cent of the genome but contain around 85 per cent of known disease-related variants.
  3. T. Bond, R.C. Richmond, V. Karhunen, et al. (2022) Exploring the causal effect of maternal pregnancy adiposity on offspring adiposity: Mendelian randomisation using polygenic risk scores. BMC Med. DOI: 10.1186/s12916-021-02216-w.
  4. Wade, K.H., Lam, B.Y.H., Melvin, A. et al. (2021) Loss-of-function mutations in the melanocortin 4 receptor in a UK birth cohort. Nat Med. DOI: 10.1038/s41591-021-01349-y
  5. J. Wertz, T.E. Moffitt, L. Arseneault, et al. (2023) Genetic associations with parental investment from conception to wealth inheritance in six cohorts. Nat Hum Behav. DOI: 10.1038/s41562-023-01618-5
  6. C. A. Dennison, J. Martin, A. Shakeshaft, et al. (2024) Stratifying early-onset emotional disorders: using genetics to assess persistence in young people of European and South Asian ancestry. J Child Psychol Psychiatry. DOI: 10.1111/jcpp.13862.
  7. Body weight is not a choice. Bristol University article, available: https://www.bristol.ac.uk/alspac/participants/discoveries/body-weight/
  8. Corresponding data note available on Wellcome Open Research: https://wellcomeopenresearch.org/articles/9-390/v2

Funding: The sequencing of the data was funded by the Wellcome Sanger Institute, with further funding from the Medical Research Council and the Economic and Social Research Council.

Selected websites:

The Wellcome Sanger Institute

The Wellcome Sanger Institute is a world leader in genomics research. We apply and explore genomic technologies at scale to advance understanding of biology and improve health. Making discoveries not easily made elsewhere, our research delivers insights across health, disease, evolution and pathogen biology. We are open and collaborative; our data, results, tools, technologies and training are freely shared across the globe to advance science.

Funded by Wellcome, we have the freedom to think long-term and push the boundaries of genomics. We take on the challenges of applying our research to the real world, where we aim to bring benefit to people and society.

Find out more at www.sanger.ac.uk or follow us on Twitter, Instagram, FacebookLinkedIn and on our Blog.

About Wellcome

Wellcome supports science to solve the urgent health challenges facing everyone. We support discovery research into life, health and wellbeing, and we’re taking on three worldwide health challenges: mental health, infectious disease and climate and health. https://wellcome.org/

About Children of the 90s

Based at the University of Bristol, Children of the 90s, also known as the Avon Longitudinal Study of Parents and Children (ALSPAC), is a long-term health-research project that enrolled more than 14,000 pregnant women in 1991 and 1992. It has been following the health and development of the parents and their children in detail ever since and is currently recruiting the children of the original children into the study. It receives core funding from the Medical Research Council, Wellcome and the University of Bristol. Find out more at www.childrenofthe90s.ac.uk.

About the Millennium Cohort Study

The Millennium Cohort Study (MCS) is following the lives of around 19,000 young people born across England, Scotland, Wales and Northern Ireland in 2000-02. Based at the UCL Centre for Longitudinal Studies, MCS is funded by the Economic and Social Research Council and a consortium of government departments. Visit www.cls.ucl.ac.uk/mcs.

About the Born in Bradford Study

Born in Bradford is an internationally-recognised research programme which aims to find out what keeps families healthy and happy. We use this information to work with the local authority, health, education and voluntary sector providers across Bradford district to develop, implement and evaluate ambitious programmes to improve population health. We have a vast ‘city of research’ infrastructure which includes detailed health and wellbeing information on Bradfordians enrolled in our three birth cohort studies and a connected routine dataset of health, social care and education data for over 700,000 citizens living in Bradford and Airedale. We host a range of initiatives to improve health working with the local authority, health, education, cultural and voluntary sector providers. You can find out more about our research programme at www.borninbradford.nhs.uk.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.