Panel discussion:
Getting principled: Supporting practical approaches
to PPIE for research in the NGRL
Dr Doreen Tembo, Head of Public Involvement and Engagement - Health Data Research UK (HDR UK) and Strategic Lead for the Public Engagement in Data Research Initiative (PEDRI)
Dr Helen Bulbeck, Member of the Participant Panel at Genomics England and Director of Services and Policy at brainstrust
Eva Bensasson, Member of the Participant Panel, Genomics England
Professor Keyoumars Ashkan MBE, Lead for Functional and Oncological Neurosurgery, King's College Hospital
Panel discussion: Can genomics transform health outcomes for the adult population?
Chiamaka P Ojiako, Strategic Engagement Lead, Genomics England
Dr Jude Hayward, Clinical lead for
primary care, Genomics England
Dr Katie Snape,
Principal Clinician – adult study, Genomics England
Emma Walters, Member of the Participant at Panel at Genomics England
Videha Sharma, Clinical Innovation Lead, University of Manchester and Co-founder, Fava Health
Dr Richard Scott joined Genomics England in 2015 and has been CEO since 2023. He is also an Honorary Consultant and Honorary Associate Professor in Clinical Genetics at Great Ormond Street Hospital for Children and the UCL Institute of Child Health where his practice focuses on diagnosing children with rare multisystem disorders.
Richard trained in medicine at Cambridge University and University College London. He specialised in Paediatrics and subsequently Clinical Genetics in London and completed his PhD on childhood cancer syndromes at the Institute of Cancer Research.
Through his clinical practice and in his role at Genomics England he is passionate about harnessing the power of new genomic technologies for the benefit of patients in mainstream healthcare.
Emily has been a professional genomics trainer for thirteen years. She is currently working at Genomics England, delivering training and developing documentation and online tutorials for researchers working with the Genomics England Research Environment.
She was previously the Ensembl Outreach Project Leader at the European Bioinformatics Institute, leading a small team training researchers using the Ensembl website and APIs. She achieved her PhD at the University of Edinburgh in molecular biology and also has experience in scientific outreach, education and communication.
Isidro Cortés-Ciriano leads the Cancer Genomics group at the EMBL-EBI and is Associate Faculty at the Sanger Institute. His laboratory focuses on the development of computational methods for early cancer detection and to study the mechanisms underpinning cancer evolution using whole-genome sequencing. Recent work by his group includes the discovery of novel mechanisms of cancer development, such as Loss-Translocation-Amplification chromothripsis in osteosarcoma, and the development of novel methods for the analysis of single cell and long-read sequencing data, such as SComatic and SAVANA. His laboratory is leading the application of long-read sequencing technologies for tumour profiling and liquid biopsy analysis through various national studies, such as the CRUK-funded initiatives, such as the Stratified Medicine Paediatrics (SMPEDS) programme and the Cancer Grand Challenges team SAMBAI. Before joining EMBL-EBI, Isidro trained as a postdoctoral fellow at Harvard Medical School, and received his PhD from the Pasteur Institute.
Professor Richard Houlston,FMedSci FRS, is Professor in Cancer Genomics and Head of Genetics and Epidemiology at the Institute of Cancer Research. A medical graduate of Imperial College, he has over 30 years of research in cancer genetics and in translating discoveries to benefit patient care. He and his research group have been active in the 100,000 Genomes Project since its inception. Now, Professor Houlston is going to lead our Pan-Cancer and Molecular Oncology community to decipher genomic variations in over 200 distinct cancer types.
Alex is a lecturer in Computational Genomics at the University of Edinburgh. Her group is interested in applying updated evolutionary tools and methods to improve variant interpretation for individuals with rare disorders and male infertility. Alex is also a member of the Participant Panel at Genomics England, having joined the panel in 2024. In her spare time Alex enjoys Board Games Evolution and Cats.
Cells acquire diverse identities by expressing genes in a context-specific manner across tissues, developmental stages, and cell types. While many genetic variants have limited phenotypic impact, variants that disrupt transcriptional specificity, that is, when and where genes are transcribed, may represent an important genetic vulnerability that contributes to disease. However, determining how variants alter specificity across contexts remains challenging because comprehensive multi-tissue profiling is infeasible for every individual. Here, we deploy sequence-based machine learning models to predict how DNA variants change RNA-sequencing coverage across tissue-, stage-, and cell type-resolved contexts, enabling genome-wide prioritization of variants that perturb transcriptional specificity at multiple regulatory resolutions. Using experimentally measured activity of human enhancers carrying specificity-disrupting variants across mouse embryonic tissues, we show that embryo-trained models recapitulate in vivo effects, including polydactyly-associated variants. In paired genotype-transcriptome datasets spanning healthy adult tissues and cell types, variants associated with altered transcriptional specificity are depleted, consistent with negative selection, and sequence-based models capture these shifts. Applying this framework to rare disease genomes reveals an enrichment of specificity-disrupting variants affecting clinically relevant genes in undiagnosed patients. Together, our results establish transcriptional specificity as a regulatory axis of disease risk and provide a scalable approach to identify specificity-altering variants genome-wide across biological contexts.
Authors: Miquel Anglada-Girotto1*, Jonathan Frazer1,✣,*, Mafalda Dias1,✣,*
✣these authors contributed equally.
*corresponding author(s): miquel.anglada@crg.eu, jonathan.frazer@crg.eu, mafalda.dias@crg.eu
Previous analysis in the Genomics England 100,000 Genomes Project showed that pathogenic HTT CAG expansions, which cause Huntington’s disease (HD), are two- to three-fold more frequent than expected from clinical prevalence estimates. Building on this, we model repeat length as a continuous variable in UK Biobank to quantify penetrance at pathogenic lengths and test whether variation within the normal and intermediate range is associated with differences in brain volume and neuropsychiatric risk.
Analyses were performed in 474,446 UK Biobank participants, including 30,052 with intermediate repeats (27-35), 873 with reduced-penetrance repeats (36-39), and 155 with pathogenic repeats (≥40); 48,378 individuals had structural MRI. Brain volumes were modelled as a continuous function of repeat length within the normal/intermediate range using regression models, and depression onset was analysed using Cox models. Pathogenic-repeat carriers were analysed separately for diagnostic penetrance.
Among carriers with 40-41 repeats, 62% were diagnosed with HD by age 84 years, lower than predicted penetrance (>95%). Among those with ≥40 repeats and MRI data, 77% (7/9) showed imaging features consistent with early-stage disease. Within the normal and intermediate range, longer repeat length was associated with smaller volumes across subcortical and global brain measures. In reduced-penetrance/pathogenic allele carriers, some structures followed the linear trend within the normal/intermediate range, whereas others showed more marked volume loss. Clinically, intermediate allele carriers had increased risk of depression onset compared with normal allele carriers.
These findings support a continuum model in which repeat length shapes brain structure and neuropsychiatric vulnerability population-wide, challenging dichotomous models of pathogenicity.
Authors: Harriet Cullen (1), Chris Clarkson (2,3) , Henrique Nascimento (4), Matteo Zanovello (3), Anupriya Dalmia (4), Michael Simpson (1), Sarah Tabrizi (5), Arianna Tucci (2,3).
Phenotypic data underpin genomic diagnosis and rare disease research, yet most datasets rely primarily on clinician-reported information. Parent-reported data offer an accessible and scalable way to collect phenotypic information and reflect the lived expertise of families caring for children with rare genetic conditions. However, how these perspectives complement clinical phenotyping remains incompletely understood.
In the GenROC cross-syndrome cohort, we analysed parallel parent-reported web questionnaires and clinician-reported Human Phenotype Ontology (HPO) proformas from 477 children with monogenic neurodevelopmental disorders to compare the contribution of parent and clinician perspectives.
Parents contributed a similar quantity of phenotypic terms to clinicians but described different aspects of their child’s health. Parent reports more frequently captured everyday and lived-experience features—particularly respiratory, gastrointestinal and dental phenotypes—while clinicians provided greater detail in specialised domains such as neurological findings and imaging descriptors. Overall similarity between datasets was modest (mean similarity score 0.38), highlighting that each source captures distinct perspectives on the same condition. For example, seizures were reported by parents alone in 27% of cases, suggesting some clinically relevant phenotypes may be under-recognised in traditional datasets.
Integrating these perspectives is already informing gene-specific insights: our single-gene sub-analyses so far in CASK, TUBA1A and PUF60 have identified clinically relevant phenotypes that are not well recognised in the current literature.
These findings highlight the value of partnering with families in genomic research to build richer phenotypic datasets that better capture the full spectrum of rare genetic disease.
Authors: Karen J. Low (1,2), Huw Day (3), GenROC Consortium (4), Mevmi L.K. Thanthilla (2), Charlotte Davis (2), Amber Knapp-Wilson (2), Hannah Compton (2), Lauren Cairns (5), Helen V Firth (6,7), Caroline Wright (8)
Observing the same de novo mutation (‘recurrent’ DNM) in multiple probands with developmental disorders (DD) is often considered strong evidence of pathogenicity and has been used to establish gene–disease relationships. However, the mutational target for DD is large (~1,000 dominant genes), and phenotypes are often non-specific. As cohort sizes grow, recurrent DNMs are increasingly likely to arise by chance, particularly at highly mutable sites or where mutations confer a selective advantage in the male germline.
In a cohort of 169,460 trio exomes and genomes from children with DD, we identified 6,915 DNMs observed in more than one proband. The most recurrent variant, p.R203W in PACS1, occurred in 89 probands. To distinguish pathogenic recurrence from chance, we estimated expected mutation rates at each site using sequence-based mutability models combined with sperm NanoSeq data, which captures mutation frequencies in the male germline. This framework highlights variants with higher-than-expected recurrence.
FAM222B p.R300H was observed in 11 probands but is unlikely to be pathogenic due to high baseline mutability and evidence of sperm selection, illustrating the limitations of relying on recurrence alone. Overall, we identified 353 recurrent non-synonymous DNMs (n≥3) occurring more often than expected (dN/dS q<0.1), including 18 in genes not previously associated with DD. Additionally, 66 variants are absent from or inconsistently classified in ClinVar, highlighting opportunities to improve variant interpretation.
We are extending this analysis to over 180,000 trios, incorporating data from the Genomic Medicine Service and the 100,000 Genomes Project.
Authors: Katrina Andrews1,2, Matthew Neville1, Joanna Kaplanis1, Erwan Delage1, Daniel Jaramillo-Calle1, Robert Kueffner3, Vinnie Ustach3, Zhancheng Zhang3, Michelle Morrow3, Juliet Hampstead4, Christian Gilissen4, Raheleh Rahbari1, Helen Firth1,4, Sarah Lindsay1, Matthew Hurles1
High-grade ovarian carcinoma (HGOC) has multilayered genomic complexity from diverse mutational processes. Clinical assays for homologous recombination deficiency identify isolated mutational signatures but cannot capture the continuum of possible DNA damage repair (DDR) choices or consistently guide PARP inhibitor (PARPi) use. We proposed that integrative machine learning using biologically-informed DDR genomic markers could identify latent factors with improved prediction of benefit from platinum and PARPi therapy.
We established a national healthcare outcome study of 466 HGOC patients, integrating tumour-normal deep whole-genome sequencing (WGS) with real-world treatment and outcomes and extracted integrated genomic features (IGF). We applied feature engineering before probabilistic multi-view latent factor modelling (MOFA2) across substitutions, indels, structural variants and templated insertions (TINS). Cox models predicting overall survival (OS) and progression-free survival (PFS) were adjusted for clinical characteristics and molecular covariates. An emulated clinical trial of PARPi was used to assess the predictive power of IGFs.
MOFA2 recovered seven IGFs spanning DSB repair, chromosomal instability and oxidative/replication stress. IGF2 (microhomology deletions, TINS and interstitial deletions; polymerase theta (POLQ)-mediated end-joining, BRCA2-type) marked strong platinum sensitivity and longer survival (OS HR 0.81 per 1 SD) but reduced PARPi benefit. Second-line PARPi benefit was confined to IGF2-low tumours (interaction HR 2.33, P=0.04). IGF5 (tandem duplications/blunt deletions; BRCA1-type) predicted PARPi benefit.
These mechanism-specific, WGS-derived phenotypes are now being implemented in clinical reporting, providing a biological basis for biomarker-stratified trials of PARP inhibitors and POLQ inhibitors in ovarian cancer and in other tumours with DNA-repair defects.
Authors: Ionut-Gabriel Funingana1, John Ambrose2, Luca Porcu1, Philip Smith1, Bradley Thomas1, Zeyu Gao3, Ines Prata Machado3, Patrick Tarpey4, Mireia Crispin-Ortuzar3, Marc Tischkowitz5, Florian M. Markowetz1, Alona Sosinsky2, James D. Brenton1
Pharmacogenetic (PGx) variants can influence how individuals respond to treatment and are associated with the risk of adverse drug reactions (ADRs). To understand the potential impact of these variants in the 100K Genomes Project, we analysed the frequency and clinical impact of 243 PGx-variants across 76 gene-drug pairs in 76,805 individuals. CPIC guidelines were used to derive each individual participant’s phenotype (e.g. poor/rapid metabolizer) from their genotypes, and to estimate the proportion of individuals needing treatment adjustments. We used NHS-England primary care prescribing data to extrapolate the number of prescriptions in which a change would be advised. A phenome-wide association study of 101 ADR-related ICD10 codes was performed.
Nearly all participants (99.3%) carry at least one actionable variant, and 77.6% carry between two and five actionable variants. Some variants showed significant differences across genetically inferred ancestries. For example, 88% of individuals with African ancestry carry a genotype linked to potential unfavourable response to antimicrobials (PEG-IFN2a, PEG-IFN2b) in IFNL3. Similarly, we observed more than 50% of the participants with East Asian ancestry requiring an alternate drug from the commonly prescribed Clopidogrel. CYP2D6/CYP2C19 PGx-variants, involved in the metabolism of antidepressants, were observed in 61.5% of the cohort. If exposed to the medication, this would translate into more than one million prescription changes annually to reduce side effects and optimise treatment. We found CYP2D6 alleles *17 and *29 significantly associated with essential tremor and dental abrasion, respectively.
Our results demonstrate the clinical value of PGx-implementation and the importance of ancestry-inclusive research.
Authors: Claudia P Cabrera*, Ivone US Leong*, Valentina Cipriani*, Yoonsu Cho, Alex Stuckey, Sonali Sanghvi, Dorota Pasko, Christopher A Odhams, Greg S Elgar, Richard M Turner, Emma Magavern, Georgia Chan, Adam Giess, Susan Walker, Rebecca E Foulger, Eleanor M Williams, Louise C Daugherty, Antonio Rueda-Martin, Olivia Niblock, Alexandra Pickard, Lauren Marks, Sarah EA Leigh, Matthew J Welland, Marta Bleda, Catherine Snow, Sandi Deans, Nirupa Murugaesu, Richard H Scott, Michael R Barnes, Matthew A Brown, Loukas Moutsianas, Augusto Rendon, Sue Hill, Alona Sosinsky, Sir Mark J Caulfield, Ellen M McDonagh
Background:
Noncoding regulatory elements are weakly conserved at the base-pair level compared to coding sequence. Structural variants (SVs), affecting multiple bases, are more likely to disrupt these elements than single nucleotide variants, but their contribution to rare disorders remains incompletely assessed. Here, we analyse SVs in 125,730 participants from the UK National Genome Research Library.
Methods:
Biallelic deletions were identified genome-wide in 58,022 rare disease cases and 67,708 controls. SVs were annotated with ENCODE candidate cis-regulatory elements (cCREs), gene features, and gene-disease associations from OMIM/PanelApp. Chi-squared tests assessed SV feature enrichment relative to intergenic SVs without annotation.
Results:
Across the cohort, 5.2% of the genome was covered by a biallelic deletion in at least one individual. Deletions truncating known disease genes were enriched in cases (OR=3.19, 95%CI=2.46–4.12, P=7.3×10⁻²⁰). A modest enrichment was observed for deletions overlapping cCREs (OR=1.02, 95%CI=1.004–1.026, P=0.007) and those overlapping disease gene promoters (OR=1.65, 95%CI=1.19–2.29, P=3.4×10⁻³). Of 150 participants with a rare SV (AF<0.001) overlapping a disease gene promoter, 37 (26 genes) were putative diagnoses after phenotype review (51.35% recurrent in ≥2 probands). RNA sequencing in three cases suggested reduced transcript expression, including isoform-specific effects in PLEC and NHS.
Conclusion:
Genomic deletions overlapping cCREs are enriched in rare disease cases. Promoter deletions are a recurring, underappreciated contributor to rare disease burden.
Authors: Anthony EF McGuigan, Hyung Chul Kim, Tanishi Moitra, Jenny C. Taylor, Nicola Whiffin
Background
Polycystic ovary syndrome (PCOS) is a common endocrine disorder associated with substantial reproductive, metabolic, cardiovascular, and mental health morbidity, yet it remains frequently underdiagnosed.
Methods
We developed a transformer-based survival model using EHR data of women aged 16-50 years with no recorded PCOS at baseline from the CPRD Aurum in the UK. External validation in the All of Us Research Program in the USA. Model performance was assessed using discrimination, calibration, and decision-curve analyses. Clinical validity was evaluated using explainability analyses, early-identification analyses, and a matched cohort analysis.
Findings
In the UK internal validation cohort (2,046,104 women; 8,448 PCOS diagnoses), the model demonstrated strong discrimination (C-index 0.827). Calibration was good, and decision-curve analysis showed net benefit across clinically relevant thresholds. External validation in the USA (79,252 women; 1,183 diagnoses) showed moderate discrimination (0.762). Explainability analyses highlighted established PCOS-related features, including menstrual irregularities, hyperandrogenic manifestations and treatments. High-risk patients experienced a higher long-term incidence of PCOS-related morbidity across mental health, metabolic, cardiovascular, and reproductive outcomes compared with matched controls. Using a threshold capturing approximately 60% of eventual diagnoses, the mean lead time before clinical diagnosis was 5.41 years in the UK and 6.36 years in the USA.
Interpretation
A transformer-based survival model using routinely collected EHR can predict future clinical diagnosis of PCOS several years before routine recognition and identify individuals at elevated risk of PCOS-related multimorbidity. Prospective evaluation is needed to determine whether earlier identification can reduce diagnostic delay and improve long-term outcomes.
Authors: Nouman Ahmed, Kazem Rahimi, Shishir Rao
Background: Colorectal cancer (CRC) is the third most common cancer worldwide and is increasingly seen in younger people and increasingly linked to the microbes in the gut. The 100,000 Genomes Project has advanced our understanding of the genome and microbiome in colorectal cancer (CRC). We have established a pathway to integrate in situ transcriptomics.
Methods: A pilot study of 55 CRC tissue microarray (TMA) cores from the 100,000 Genomes Project were H&E-stained and scanned for AI tumour infiltrating lymphocytes (TILs) detection (HeteroGenius). TMA sections were analysed using CosMx (Single-cell spatial transcriptomics 6000 gene assay) and GeoMx (regional whole transcriptome assay). Same slide imaging enabled integrative digital pathology and single-cell spatial transcriptome alignment followed by analysis. We compared Klintrup-Mäkinen score of peritumoural inflammation, Stroma AReactive Invasion Front Areas (SARIFA) scoring, bacterial load, alpha diversity and select CRC-associated taxa.
Results: Distinct spatial and transcriptional signatures were observed for MMR status, inflammation and SARIFA status. Higher TILs were associated with lower Fusobacterium and overall bacterial load. SARIFA-negative cases trended towards lower bacterial load, Fusobacterium and Parvimonas, and higher Akkermansia.
Discussion: Generating spatial transcriptomic data from tumour TMAs for multimodal analyses with extended pathology and AI phenotyping is feasible to create a publicly available multiplatform tumour profiling resource using routinely available tissue. We wish to scale this across the full Genomics England colorectal cancer cohort to further elucidate the CRC-associated microbiome, immune response and to ultimately benefit patients and their families.
Authors: Cartlidge, C.R.1; McCargow, E.2; Wood, H.1; Chen, C.2; Laye, J.1; Magee, D.R.3; Hemmings, G.1; Legrini, A.4; Doncheva, Y.4; Latifi, G.4; Kennedy Dietrich, C.4; McNickle, L.4; McKenzie, M.4; Hatthakarnkul, P.4;Edwards, J.4; Jamieson, N.B.4; Quirke, P.1
Background: Colorectal cancer (CRC) is the third most common cancer worldwide and is increasingly seen in younger people and increasingly linked to the microbes in the gut. The 100,000 Genomes Project has advanced our understanding of the genome and microbiome in colorectal cancer (CRC). We have established a pathway to integrate in situ transcriptomics.
Methods: A pilot study of 55 CRC tissue microarray (TMA) cores from the 100,000 Genomes Project were H&E-stained and scanned for AI tumour infiltrating lymphocytes (TILs) detection (HeteroGenius). TMA sections were analysed using CosMx (Single-cell spatial transcriptomics 6000 gene assay) and GeoMx (regional whole transcriptome assay). Same slide imaging enabled integrative digital pathology and single-cell spatial transcriptome alignment followed by analysis. We compared Klintrup-Mäkinen score of peritumoural inflammation, Stroma AReactive Invasion Front Areas (SARIFA) scoring, bacterial load, alpha diversity and select CRC-associated taxa.
Results: Distinct spatial and transcriptional signatures were observed for MMR status, inflammation and SARIFA status. Higher TILs were associated with lower Fusobacterium and overall bacterial load. SARIFA-negative cases trended towards lower bacterial load, Fusobacterium and Parvimonas, and higher Akkermansia.
Discussion: Generating spatial transcriptomic data from tumour TMAs for multimodal analyses with extended pathology and AI phenotyping is feasible to create a publicly available multiplatform tumour profiling resource using routinely available tissue. We wish to scale this across the full Genomics England colorectal cancer cohort to further elucidate the CRC-associated microbiome, immune response and to ultimately benefit patients and their families.
Authors: Cartlidge, C.R.1; McCargow, E.2; Wood, H.1; Chen, C.2; Laye, J.1; Magee, D.R.3; Hemmings, G.1; Legrini, A.4; Doncheva, Y.4; Latifi, G.4; Kennedy Dietrich, C.4; McNickle, L.4; McKenzie, M.4; Hatthakarnkul, P.4;Edwards, J.4; Jamieson, N.B.4; Quirke, P.1
Background: CHARGE syndrome is a rare multisystem developmental disorder characterised by variable ocular, cardiac, craniofacial, auditory, and neurodevelopmental features. Most molecularly confirmed cases are associated with heterozygous pathogenic variants in CHD7, which encodes an ATP-dependent chromatin remodelling protein. However, a substantial minority of clinically suspected cases remain genetically unresolved, and the contribution of non-coding CHD7 variation is incompletely defined.
Methods: We screened the 100,000 Genomes Project using Human Phenotype Ontology terms aligned to the Hale et al. (2016) CHARGE diagnostic framework. Thirty-seven participants (21 male, 16 female) met phenotype-based inclusion criteria. We interrogated both coding and non-coding variation in CHD7 and prioritised using ACMG classification criteria and in silico functional annotations. Latent class analysis characterised phenotypic substructure across the cohort. In CHD7-negative cases, additional candidate genes were prioritised based on phenotypic fit and evidence for oligogenic inheritance.
Results: Twenty-nine of 37 individuals (78.4%) harboured at least one prioritised CHD7 variant. Across the cohort, we identified 176 CHD7-region variants, including four previously uncharacterised non-coding variants of interest. Latent class analysis resolved three phenotypic clusters, consistent with marked clinical heterogeneity among individuals with a CHARGE-like presentation. In participants without an obvious CHD7-based explanation, we prioritised 28 additional candidate genes for further investigation.
Conclusions: Phenotype-driven screening within the 100,000 Genomes Project a high diagnostic yield for candidate CHD7 variation and illuminated both potential non-coding CHD7 contributions and additional candidate genes in genetically unresolved cases. These findings broaden the spectrum of genomic variation relevant to CHARGE syndrome and provide a foundation for functional validation and genotype–phenotype correlation.
Authors: Nirmal Vadgama, Jamal Nasir, Lucie Quillson