Structured Datasets

12M Records

ALL

Demographic

Demographic data including gender and year of birth.


44M Records

From 2000

Accident and Emergency Department Attendance

AE attendance data including attendance date (up to month), calculated age on admission, triage category and discharge information.


84M Records

From 1997

Inpatient Admission, Transfer and Discharge

Inpatient episode transaction data including admission and discharge date (up to month), calculated age of admission, admission source, admission specialty and discharge information.


523M Records

From 2000

Outpatient Appointment

Outpatient appointment and attendance data including appointment date (up to month), calculated age on date of appointment, attended specialty and appointment type indicating first attendant or follow-up attendant.


102M Records

ALL

Diagnosis

Diagnosis progress data including patient’s diagnosis, diagnosis status and diagnosis date (up to month).


39M Records

ALL

Procedure

Procedure progress data including procedure and date of procedure (up to month).


1,170M Records

From 2000

Medication

Dispensed prescription data including dispended drug item with corresponding British National Formulary (BNF) code, prescription period and prescribed dosage.


6M Records

From 01 Oct 2009

Immunization

Immunization data of Hospital Authority including injection date (up to month) and vaccine injected.


438M Records

ALL

Family Medicine

Patient disease data including date of the patient disease result (up to month) with ICPC2 code.


743K Records

From 2002

Obstetrics

Obstetrics data including date of baby delivery (up to year), weight of the baby at birth, parity and maturity.


2,538M Records

From 2000

Laboratory Tests and Results

Laboratory result data including chemical pathology, hematology & immunology and microbiology & virology with reference date of the laboratory result (up to month).


131M Records

From 1 Apr 1999

Radiology Examinations

Radiology examination result including registration date of the radiology examination (up to month), examination details and calculated age of the patient on the day of radiology appointment.

Unstructured Datasets

130M Records

From 3 Feb 1994

Clinical Note/Summary

Clinical and discharge note including record creation date (up to month) and text content of the clinical note.


65M Records

From 2000

Laboratory Result

Text content of the laboratory result.


23M Records

From 1 Apr 1999

Radiology Report

Text content of the examination report.


Project-based

Radiology Image

Image data (in DICOM format) extraction time varies according to the required volume, and the data will be provided in phases throughout the project period.

Data Products

Structured data of 14 chronic diseases:

320K+ Patients

Chronic Heart Failure (CHF)

A progressive condition where the heart’s ability to pump blood efficiently is impaired, leading to inadequate circulation. Symptoms include shortness of breath, fatigue, exercise intolerance, and edema (swelling in the feet, ankles, or abdomen). Common causes include coronary artery disease, history of myocardial infarction, or hypertension. CHF predominantly affects individuals aged above 65.


730K+ Patients

Chronic Kidney Disease (CKD) Stage 3A, 3B, 4 & 5

It is defined by glomerular filtration rate (GFR) <60 mL/min/1.73m² or structural/functional kidney abnormalities persisting ≥3 months. Diagnostic markers include albuminuria, urinary sediment irregularities, or tubular electrolyte disorders.

Stage 3A: GFR 45–59 mL/min/1.63m²

Stage 3B: GFR 30–44 mL/min/1.73m²

Stage 4: GFR 15–29 mL/min/1.73m²

Stage 5 (Kidney Failure): GFR <15 mL/min/1.73m²


180K+ Patients

Chronic Obstructive Pulmonary Disease (COPD)

A preventable, progressive respiratory disorder characterized by persistent airflow limitation due to chronic inflammation of airways and lung tissue, typically triggered by exposure to harmful particles/gases (e.g., smoking). Exacerbations and comorbidities (e.g., cardiovascular disease) worsen disease severity.


560K+ Patients

Coronary Heart Disease (CHD)

Also termed coronary artery disease (CAD).  It is caused by atherosclerotic narrowing/blockage of coronary arteries, reducing blood flow to the heart. Common symptoms are angina, dyspnea on exertion, and myocardial infarction. Risk factors include age, smoking, hypertension, hyperlipidemia, diabetes, obesity, and family history of early-onset CHD.


270K+ Patients

Dementia

A syndrome characterized by progressive cognitive decline, including memory loss, impaired reasoning, personality changes, and difficulty performing daily tasks. Common etiologies include Alzheimer’s disease, cerebrovascular injury, or neurodegenerative disorders, with symptoms typically progressing in severity over time.


850K+ Patients

Diabetes Mellitus (DM)

A metabolic disorder characterized by chronic hyperglycemia due to defects in insulin secretion, insulin action, or both. This condition disrupts carbohydrate, lipid, and protein metabolism. The subtypes include Type 1 (autoimmune) and Type 2 (insulin resistance).


250K+ Patients

Glaucoma

A group of eye disorders involving increased intraocular pressure, damaging the optic nerve and leading to vision loss. This condition is caused by impaired drainage of aqueous humor, which compresses the retina and optic nerve.


310K+ Patients

Hepatitis B Carriers

Individuals with persistent hepatitis B virus (HBV) infection, defined as HBsAg-positive or HBV DNA-positive over 6 months. Carriers face elevated risks of cirrhosis and hepatocellular carcinoma (HCC), independent of liver function status.


160K+ Patients

Hip Fracture (as approximate for Osteoporosis)

A break in the proximal femur (upper thigh bone), often caused by trauma or osteoporosis. This injury is common in elderly populations and associated with reduced mobility.


1290K+ Patients

Hyperlipidemia (HLD)

A group of metabolic disorders characterized by elevated blood lipoprotein levels, including high cholesterol (LDL) and/or triglycerides. This condition represents a major risk factor for atherosclerosis and cardiovascular disease.


2100K+ Patients

Hypertension (HT)

Within hypertension classifications, primary hypertension (accounting for 95% of cases) refers to persistently elevated blood pressure without identifiable cause. Key risk factors include age, obesity, salt sensitivity, and genetic predisposition.


70K+ Patients

Parkinsonism

A clinical syndrome defined by bradykinesia plus ≥1 of the following: resting tremor, rigidity, or postural instability. Causes include Parkinson’s disease, drug-induced dopamine blockade, vascular lesions, or neurodegenerative disorders.


220K+ Patients

Stroke

Also known as cerebrovascular accident (CVA), occurs when there is sudden cerebral ischemia (loss of blood flow to the brain) due to thrombosis, embolism, or hemorrhage. Common symptoms include hemiparesis, aphasia, and altered consciousness. Major risk factors include hypertension, atrial fibrillation, diabetes, smoking, and age.


200K+ Patients

Depression

A mood disorder characterized by persistent sadness, anhedonia, fatigue, sleep disturbances, and suicidal ideation, affecting 15–25% of cancer patients. Treatable with therapy and/or pharmacotherapy.


 

 

Structured data of 11 cancers (Prevalence provided by Hong Kong Cancer Registry):

80K+ Patients

Colorectal Cancer

A malignancy arising in the colon or rectum, often from adenomatous polyps.


60K+ Patients

Breast (Female) Cancer

Most commonly ductal or lobular carcinoma, originating in milk ducts or glands. Invasive forms spread beyond the primary site.


80K+ Patients

Lung Cancer

Classified as small cell (aggressive, linked to smoking) or non-small cell (e.g., adenocarcinoma, squamous cell).


30K+ Patients

Prostate Cancer

Adenocarcinoma of the prostate gland, primarily affecting older males.


30K+ Patients

Liver Cancer

Classified into primary (hepatocellular carcinoma) or secondary (metastatic spread from other organs).


10K+ Patients

Nasopharynx Cancer

A malignancy in the nasopharyngeal epithelium, linked to Epstein-Barr virus in endemic regions.


20K+ Patients

Stomach Cancer

Also known as gastric cancer, refers to adenocarcinoma of the gastric mucosa.


10K+ Patients

Corpus Cancer

Malignant tumors of the uterine corpus (e.g., endometrial carcinoma, leiomyosarcoma).


9K+ Patients

Ovary Cancer

Includes epithelial carcinomas (most common) and germ cell tumors. Often diagnosed at advanced stages.


8K+ Patients

Cervix Cancer

A malignancy of the cervix, frequently associated with HPV infection.


10K+ Patients

Non-Hodgkin Lymphoma

Diverse group of lymphocyte cancers (B-cell or T-cell origin), distinct from Hodgkin lymphoma.