Data and cohort
We currently have data available from our baseline health questionnaire, participant geographies, linked NHS England health records, genotyping array, and clinic measurements.
Contents
Our Future Health data files
For more information about our data, download:
- Our Future Health Release 14 data dictionary (XLSX, 128KB)
- Our Future Health Release 14 coding file (XLSX, 15.3MB)
- Genotyping array CPRA (“Chrom:Pos:Ref:Alt”) variant list (CSV, 13.5MB)
- Imputed genotype data CPRA ("Chrom:Pos:Ref:Alt") variant list (ZIP, 581MB) (the contents of this zip file are 3.3GB).
Or view detailed information about all of our data on our documentation hub.
Go to the documentation hubopens in new tab
The Our Future Health cohort
The Our Future Health programme is open to all adults (18 years and older) living in the UK. The data that we’ve gathered so far includes:
- 2,021,810 participants who have consented to take part and completed our baseline health questionnaire
- a subset of 1,983,038 participants with geographical data
- a subset of 1,690,704 participants successfully linked to an NHS number (of which 1,666,336 participants have at least one secondary care, dispensed medication, or death record)
- a subset of 1,518,202 participants with clinic measurements data, which includes body size and circulatory function (of which 1,159,273 with Point-of-Care Testing (POCT) lipid profile data)
- a subset of 775,000 participants who have been successfully genotyped and have imputed genetic data
Go to the programme design and recruitment documentationopens in new tab
Our aim is to build a data set that reflects the UK population. The following tables show the current composition of our cohort.
|
Participant age |
Cohort percentage |
|---|---|
|
18 to 39 |
24.7% |
|
40 to 59 |
37.3% |
|
60 to 79 |
35.8% |
|
80+ |
2.2% |
|
Participant sex registered at birth |
Cohort percentage |
|---|---|
|
Female |
57.2% |
|
Male |
42.7% |
|
Other |
<0.1% |
|
Participant ethnicity |
Cohort percentage |
|---|---|
|
Asian |
5.5% |
|
Black |
1.6% |
|
White |
89.6% |
|
Mixed |
1.8% |
|
Other |
1.4% |
View the Characteristics of Our Future Health participantsopens in new tab
The available data
The current data available includes:
- Participant data - which contains registration, consent and baseline demographic information collected across all consented participants
- Participant geographies data - which contains geographic information derived from participants’ self-reported address at the time of registration to the Our Future Health programme
- Questionnaire data - which contains self-reported health information, details about participants' household, socioeconomic status, work and education history and family history
- Genotype array data – which contains single nucleotide polymorphism (SNP) data extracted from blood and made available in two different file formats
- Imputed genetic data - a subset of participants that increases the number of variants from 700k to 159 million. The same participants in the imputed genetic data set will also be in the genotype array data set
- Genetic ancestry data - contains genetic ancestry assignments inferred for a subset of participants. The same participants are also in the genotype array and imputed genetic data sets
- Linked health records data – which contains health records provided by the National Health Service (NHS) in England and registrations of death from the Office for National Statistics (ONS)
- Clinic measurement data – which contains height, body weight, waist circumference, blood pressure, heart rate and heart rhythm, plus new Point-of-Care Testing (POCT) lipid profile data
Data sets are stored and accessed in the Our Future Health Trusted Research Environment (TRE).
Data for participants who have fully withdrawn from Our Future Health is not included, as those data are deleted routinely after they request to withdraw. Participants who have fully withdrawn from the programme since the last data release will not be included in the current data release.
Participant data
The participant data set includes self-reported demographic information about:
- ethnicity
- gender and sex
- month and year of birth
It also includes information relating to registration and consent:
- month and year of registration with Our Future Health
- month and year of consent to take part in the Our Future Health research programme
This data set is gathered at various times, such as during participant registration and as part of the baseline health questionnaire.
Go to the participant data documentationopens in new tab
Participant geographies data
The participant geographies data currently consists of four data sets:
- Country and region
- Lower Super Output Area (England and Wales)
- Middle Super Output Area (England and Wales)
- Intermediate Zones (Scotland)
Our country data includes:
- England
- Scotland
- Wales
- Northern Ireland
All data sets outlined above are linked to each participant’s self-reported address collected during their registration for the Our Future Health programme.
The English regions, together with the other devolved nations, comprise a large proportion of political and/or geographical divisions of the UK.
Additional small area geography data has been made available in Release 13 based on 2021 census data for England and Wales from the Office for National Statistics, and 2022 census data for Scotland from the Scottish Government.
England and Wales are comprised of 35,672 Lower Super Output Areas and 7,264 Middle Super Output Areas. Scotland is comprised of 1,344 Intermediate Zones.
Only registration address is used. No subsequent address changes or participant relocations are reflected.
Go to the participant geographies documentation
Baseline health questionnaire data
There are 2 versions of the baseline health questionnaire. Version 1 of the questionnaire contains 202 questions, and the current version (version 2) contains 286 questions. Version 2 went live in November 2022. Not all participants see every question. Some questions are presented selectively, depending on participant responses.
The questions are grouped into five sections:
- about you and your household – for example, age, sex, height, weight, ethnicity and living situation
- work and education – for example, income, employment history and highest educational attainment
- lifestyle – for example, socialising, screen use and alcohol intake
- family health history – for example, siblings' and parents' health
- personal health history – for example, health check-ups and screenings, diagnoses, medications and any current symptoms
Go to the baseline health questionnaire data documentationopens in new tab
Prevalence tables
Every six months, we produce prevalence tables across a range of outcome variables using data from our latest release.
Below are the latest prevalence tables:
- Self-reported disease prevalence P14 (XLSX, 37KB)
- Self-reported medication prevalence P14 (XLSX, 37KB)
Genetic data
The genetic data sets available are listed below where the same participants are represented across all:
Genotype array data
The first release of our genotype array data was made available in the Our Future Health TRE on 12 December 2023.
The latest release consists of 686,416 genetic variants across 755,000 participants who have also completed the baseline health questionnaire (417,918 female, 337,082 male).
This data set is available in two common file formats for 25 chromosomes accompanied by sample level metadata with 40 derived principal components to aid quality control (QC). Genetic kinship dataopens in new tab has also been made available
Genotype array data:
- Variant Call Format Files (VCF) and associated files
- Binary GEN file format (BGEN) and associated files
Metadata:
- Sample level QC file
- Genetic kinship data file
Go to the genotype array data documentationopens in new tab
Imputed genotype data
Our imputed genotype data was first made available in the Our Future Health TRE on 12 December 2025.
The latest release consists of 159,587,100 genetic variants across the same 755,000 participants with genotype array data. These participants have also complete the baseline heath questionnaire (417,918 female, 337,082 male).
This data set is available in two common file formats for 23 chromosomes in addition to sample level and variants level metadata for QC purposes.
Imputed genotype data:
- Variant Call Format Files (VCF) and associated files
- Binary GEN file format (BGEN) and associated files
Metadata:
- Sample level QC file
- Variant summary statistics file
Go to the imputed genotype data documentation opens in new tab
Genetic ancestry data
The genetic ancestry data was first made available in the Our Future Health TRE on 28 May 2026.
The latest release has data available for 755,000 participants who also have genotype array and imputed genotype data.
Go to the genetic ancestry data documentationopens in new tab
Linked health records data
The linked health records data sets include:
Primary care
- Medicines Dispensed in Primary Care - a monthly dataset that lists all medicines dispensed in community pharmacies, dispensing doctors, and appliance contractors in England, including drug name, quantity, and cost details.
Secondary care
- Accident and Emergency (HES A&E) – attendances recorded at major A&E departments, single specialty A&E departments, walk-in centres, and minor injury units at NHS hospitals in England
- Emergency Care (ECDS) - information on why people attend emergency departments and the treatment they receive at hospitals in England.
- Admitted Patient Care (HES APC) – episodes of care where a participant is admitted at NHS hospitals in England
- Outpatient (HES OP) – outpatient appointments at NHS hospitals in England
Cancer
Cancer Registration, which comprises three tables:
- Cancer Registry Treatment - event-level data with information on treatments received for a given tumour
- Cancer Registry Patient Tumour - tumour-level data with information on each tumour, including site and diagnosis
- Cancer Registry Pre-1995 - tumour-level data pre-1995.
Cancer Pathway - information on the services a participant accessed in their journey from diagnosis to treatment
Death
- ONS Death Registration – death registration (date of death, registrar designation) and mortality data (cause of death, age at death) for England and Wales
A Linked Participants table is also available alongside any approved linked health record data. This confirms all participants who have been successfully linked to an NHS number regardless of whether recorded data are present within a given release.
|
Data set |
Number of participants |
|---|---|
|
Medicines dispensed in Primary Care |
1,572,477 |
|
NHS Accident and Emergency (HES A&E; up to April 2020) |
1,175,047 |
|
Emergency Care Data Set (ECDS: April 2020 onwards) |
999,837 |
|
NHS Admitted Patient Care (HES APC) |
1,378,156 |
|
NHS Outpatient (HES OP) |
1,596,222 |
|
NHS Cancer Registry Patient Tumour |
192,600 |
|
NHS Cancer Registry Treatment |
161,223 |
|
NHS Cancer Registry pre - 1995 |
10,008 |
|
NHS Cancer Pathway |
119,226 |
|
ONS Death Registration |
5,579 |
Go to the linked health records data documentationopens in new tab
Clinic measurements data
The first release of our clinic measurements data was made available in the Our Future Health TRE on 12 December 2024.
This data set consists of baseline physical health measurements for over 1 million participants, taken at their clinic appointment by trained staff. Measurements include:
- height
- weight
- waist circumference
- heart rate and rhythm
- blood pressure
Go to the clinic measurements data documentationopens in new tab
Point-of-Care Testing (POCT) Lipid Profile data
This data release includes the first release of the Point-of-Care Testing (POCT) Lipid Profile data, which is provided as a separate table alongside the Clinic Measurements dataset. All participants in the POCT dataset are also present in the Clinic Measurements dataset. POCT data are present for over 1 million participants.
Collection for this data is no longer ongoing and was removed from the appointment process on 23 December 2024. The dataset therefore represents a historical collection of lipid measurements obtained during clinic appointments.
Currently the POCT Lipid Profile data includes measurements of blood lipids, which reflect cholesterol balance and cardiovascular risk and includes variables for the following:
- total cholesterol (TC): overall cholesterol concentration in blood
- high-density lipoprotein cholesterol (HDL): cholesterol fraction involved in reverse cholesterol transport; higher levels are protective
- triglycerides (TG): circulating fats used for energy; elevated levels are associated with increased cardiovascular and metabolic risk
- low-density lipoprotein cholesterol (LDL): cholesterol fraction that transports cholesterol to tissues; higher levels increase cardiovascular risk
- non-HDL cholesterol: represents all atherogenic cholesterol
- TC:HDL ratio: reflects balance between atherogenic and protective lipoproteins, with higher values indicating higher risk
Go to the POCT lipid profile data documentationopens in new tab
Our data releases
We release new data into our TRE every six months as our cohort grows.
For more information on our latest release and what has changed since previous releases, view our data release documentation for Release 14.opens in new tab
Stay up to date
Would you like to stay up to date with our work? Sign up for updates and tell us what you'd like to know.opens in new tab
Protecting the data
We de-identify all participant data before it’s available for use. All researchers will need to become registered researchers at Our Future Health and have an approved research study before they're given access to the data for research purposes.
As a registered researcher at Our Future Health:
- you must access the data for your research study in accordance with an approved study application
- you must have completed information governance training that covers UK GDPR within the past 12 months
- your organisation must sign our resource terms and conditions
Become a registered researcher
Once you've created an account with us, you can apply to become a registered researcher and access the data.
Create an accountopens in new tab
Updated: 02 June 2026