1. College of Computing and Informatics
Department of Statistics
Biostatistics and Epidemiology
Chapter 1: Principles and Methods of Epidemiology
By: Dugo G. (MSc.)
Email: dugojgadisa@gmail.com or
Dugo.Gadisa@haramaya.edu.et
1
2. Introduction to Epidemiology and Biostatistics
What is the difference between the two?
Biostatistics is the application of statistical methods
in biology, medicine, public health, and other fields
of study.
Epidemiology is the study of patterns of health and
illness and associated factors at the population level
(disease distribution, prevalence, mechanisms of
prevention, etc.)
2
3. Introduction to Epidemiology
Definitions of Epidemiology
It is a study of the distribution of a disease or a
physiological condition in human populations and the
factors that influence this distribution.
It is a study of the distribution and determinants of health-
related states and events in populations and the application
of this study to control health problems.
3
4. What is Epidemiology?
In general, Epidemiology can be
defined as the study of determinants,
distribution, and frequency of
disease.
4
5. Uses of Epidemiology
a. Community diagnosis; i.e., what are the major health problems
occurring in a community
b. Establishing the history of a disease in a population; e.g.,
identifying the periodicity of an infectious disease
c. Describing the natural history of disease in the individual; e.g.,
natural history of Cancer in the individual (from its pathological
onset (inception) to resolution (recovery or death), clinical stages)
5
6. Uses of Epidemiology
d. Describing the clinical picture of the disease; i.e., who gets the
disease, who dies from the disease, and what the outcome of the
disease is
e. Estimating risk; e.g., what factors increase the risk of heart
disease, automobile accidents, and violence
f. Identifying syndromes and precursors; e.g., the relationship of
high blood pressure to stroke, kidney disease, and heart disease
Syndromes: a group of signs and symptoms that consistently occur
together and characterize a particular abnormality or condition.
Precursors: a substance, cell, or cellular component from which
another substance, cell, or cellular component is formed
6
7. Uses of Epidemiology
g. Evaluating prevention/intervention programs; e.g., vaccine
and clinical trials
h. Investigating epidemics/diseases of unknown etiology.
Etiology encompasses understanding why a particular
condition or disease occurs.
7
8. Some Epidemiologic Concepts
Catchment area:
The geographical area from which the people attending a
particular health facility come.
Catchment population :
People attending particular health facilities
Population at risk: is vital to know all people at risk of
developing a disease or having a health problem, as well
as those who are currently suffering from it.
8
9. Some Epidemiologic Concepts
Incidence: the number of new cases, or events
occurring over a defined period of time, commonly one
year.
Prevalence: the total number of existing cases, episodes or
events occurring at one point in time, commonly on a
particular day.
9
11. Some Epidemiologic Concepts
Case: A person who is identified as having a particular characteristic
such as a disease, behavior, or condition. Cases may be divided into
possible, probable, and definite, depending on how well specific
criteria are satisfied
Controls: refer to a specific group of individuals who serve as a
comparison for the group of people with a particular disease (known
as cases).
11
12. Some Epidemiologic Concepts
Epidemic: the occurrence in a community or region of cases of an
illness or other similar event clearly in excess of what is normally
expected. The characteristics of the illness, the area and the season
all have to be taken into account.
Epidemic incidence curve: a graph that plots cases of the disease
by the time of onset of the illness. An essential part of the analysis
is this graph can indicate the nature of the outbreak and the
probable source.
12
13. Mortality Rate
Crude Death Rate (CDR) =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑑𝑢𝑟𝑖𝑜𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑚𝑖𝑑 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
∗ 1000
The crude death rate was 5.995 per 1,000 in Ethiopia in 2023.
The crude death rate in Addis Ababa was approximately 6.29 per
1000 (data of 2020).
13
14. Age-specific Mortality Rate
Age-specific mortality rate =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑖𝑛 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑎𝑔𝑒 𝑔𝑟𝑜𝑢𝑝 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑚𝑖𝑑−𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑎𝑔𝑒 𝑔𝑟𝑜𝑢𝑝
∗ 1000
One example of age specific mortality rate is Infant
Mortality Rate.
14
15. Sex-Specific Mortality Rate
Sex-specific mortality rate =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑖𝑛 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑠𝑒𝑥 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑚𝑖𝑑−𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑎𝑚𝑒 𝑠𝑒𝑥
∗ 1000
Example: The average total population of “Town A” in 2019 was
6000 (3500 female & 2500 male). In the same year, 300 people died
(100 female and 200 male). Calculate the Crude death rate and
mortality rate for females.
15
16. Case Fatality Rate
Case Fatality Rate (CFR) =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑓𝑟𝑜𝑚 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝑁𝑜 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠 𝑜𝑓 𝑡ℎ𝑎𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑡𝑖𝑚𝑒
∗ 100
Example: In 1996, there were 1000 tuberculosis patients in
one region. Out of the 1000 patients, 100 died in the same
year. Calculate the case fatality rate of tuberculosis.
16
17. Neonatal Mortality Rate
Neonatal Mortality Rate =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑢𝑛𝑑𝑒𝑟 28 𝑑𝑎𝑦𝑠 𝑜𝑓 𝑎𝑔𝑒 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝑁𝑜 𝑜𝑓 𝑙𝑖𝑣𝑒 𝑏𝑖𝑟𝑡ℎ𝑠 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑡𝑖𝑚𝑒
∗ 1000
Example: In 2010, there were a total of 5000 live births in “Zone B”.
Two hundred of them died before 28 days after birth. Calculate the
Neonatal Mortality Rate (NMR).
Answer: NMR =
200
500
∗ 1000 = 40 𝑝𝑒𝑟 1000 𝑙𝑖𝑣𝑒 𝑏𝑖𝑟𝑡ℎ𝑠
That means out of 1000 live births in 2010, 40 of them died before 28
days after birth.
17
19. Under-Five Mortality Rate
Under-five mortality rate =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑜𝑓 0−4 𝑦𝑒𝑎𝑟𝑠 𝑜𝑓 𝑎𝑔𝑒 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑚𝑖𝑑−𝑦𝑒𝑎𝑟 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑎𝑔𝑒 𝑎𝑡 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑡𝑖𝑚𝑒
∗ 1000
NB: The numerator says 0-4 years. 0-4 years in this formula means children from
birth to less than five years of age i.e. the upper age limit is not 4.
Example: In 1996, the total number of children under 5 years of age was 10,000 in
“Zone C”. In the same year, 200 children under five years of age died. Calculate the
under-five mortality rate (U5MR).
U5MR =
200
10000
∗ 1000 = 20 𝑝𝑒𝑟 1000 𝑢𝑛𝑑𝑒𝑟 𝑓𝑖𝑣𝑒 𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛
19
20. Maternal Mortality Rate
Maternal Mortality Rate =
𝑁𝑜 𝑜𝑓 𝑝𝑟𝑒𝑔𝑛𝑎𝑛𝑐𝑦 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑑𝑒𝑎𝑡ℎ𝑠 𝑜𝑓 𝑚𝑜𝑡ℎ𝑒𝑟𝑠 𝑖𝑛 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝑁𝑜 𝑜𝑓 𝑙𝑖𝑣𝑒 𝑏𝑖𝑟𝑡ℎ𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑡𝑖𝑚𝑒
∗ 1000
Maternal Mortality Rate reflects the standards of all aspects of maternal
care (antenatal, delivery, and postnatal).
The Maternal Mortality Rate in Ethiopia is estimated to be between 267 to
412 per 100,000 live births in 2020.
That means in 100,000 live births, on average around 340 mothers die
each year due to pregnancy-related causes.
20
21. Scope of Epidemiology
Its scope at the beginning was limited to understanding epidemics
Now it is the basis of advancing our understanding of all kinds of
diseases included:
Nutritional deficiencies
Infectious and non-infectious diseases
Injuries and accidents
Mental disorders
Maternal and Child Health
Cancer
Occupational Health
Environmental Health
Health behaviors
21
22. Measuring Disease Frequency
It Has several Components
Classifying and categorizing disease
Defining the period of time of risk of disease
Deciding what constitutes a case of disease in a study
Obtaining permission to study people
Making measurements of disease frequency
Finding a source for ascertaining the cases
Relating cases to population and time at risk
Defining the population at risk of disease
22
23. The Basic Triad of Descriptive Epidemiology
The three essential characteristics of disease we look for in
descriptive epidemiology:
TIME
PLACE
PERSON
23
24. Time
There are three major kinds of changes in disease occurrence over
time.
1. SECULAR TRENDS (slowly change): This refers to gradual
changes over long period of time, such as years or decades.
E.g. AIDS, cancer.
2. PERIODIC OR CYCLIC CHANGES. This refers to recurrent
alterations in the frequency of diseases.
Cycles may be annual or have some other periodicity. E.g. measles,
malaria, meningitis.
3. SPORADIC: irregular and unpredictable intervals.
E.g. influenza, Allergies
24
25. Time (Cont’d…)
Secular trend can be due to one or more of the following factors.
1. Change in diagnostic technique
2. Change in accuracy of enumerating population at risk.
3. Change in age distribution of the people.
4. Change of survival from disease.
5. Change in actual incidence of the disease.
25
26. Time (Cont’d…)
Changing or Stable
Seasonal Variation
Clustered (Epidemic) or evenly distributed (Endemic)
Point source or Propagated
26
27. Place
The frequency of disease is different in different places.
Natural barriers: environmental or climatic conditions, such as
temperature, humidity, rainfall, altitude, mineral content of soil, or
water supply
Political boundaries: Intended for planning and allocation of
resources
Urban-rural differences in disease occurrence: in terms of
migration, style of living and differential environmental exposures
also helpful
27
30. Person (Who)
Young vs Old
Female vs male
Rich vs Poor
Illiterate Vs educate
Place (Where)
Lowland vs Highland
Urban vs Rural
Time (When)
Day/night variation
Seasonal variation
Long term
30
31. Disease Occurrence
Dynamics of Disease Transmission
Interaction of agents and environmental factors with human
hosts
Distribution of severity of diseases
Modes of disease transmission
Level of disease in community when transmission stops
31
32. The Basic Triad of Analytic Epidemiology
The three phenomena assessed in Analytic Epidemiology
are:
Host
Agent Environment
32
33. The Basic Triad of Analytic Epidemiology
Host: In epidemiology, the host is usually a human who gets sick
but can also be an animal that acts as a carrier of disease but may or
may not present illness.
Agent: Epidemiologic triangle agents include Bacteria, Viruses,
Fungi, Protozoa, et cetera.
Environment: The environment represents the favorable
conditions for an agent to cause a health event. Environmental
factors include physical features like geology or climate, biological
factors like the presence of disease-transmitting insects, and
socioeconomic factors like crowding, sanitation, and access to
health services.
33
35. When Calculating DF(disease frequency)
The numerator (number of cases/episodes)
The denominator (total population at risk)
Factor (e.g. 100, 1000, 10000)
Time period (dates, weeks, months, or years)
We use the following to determine DF.
Use rates: incidence rates
Prevalence rates
35
36. Incidence and Prevalence rates
Decide what are you counting.
Episodes/cases, people, attendance or what?
What is the service count when filling monthly statistics eg.
diarrhea or malaria
people get repeated attacks in one month and attend your service
This is one person sick but has suffered several times separate
episodes in one year and attended your service several times
36
37. Incidence and Prevalence rates
Incidence: Count episodes/cases
Prevalence: chronic conditions/diseases which count the total
number of sick people.
To study the use of health services, informations on new attendance
and repeat attendance are required.
37
39. Example 1: In September 1995 there were 200 new cases of
relapsing fever in “Kebele X”. The average total population of
“Kebele X” was 4000. Calculate the incidence rate of relapsing fever
in “Kebele X” in September 1995. Answer: 50 new cases per 1000
Example 2: 5,600,000 people in South Africa were estimated to be
infected with HIV in 2009 with a total population of 53 million. What
is the prevalence of HIV in the South? Answer:
39
40. Comparing Incidence and Prevalence
Incidence
New cases or events over a
period of time
Useful to study factors
causing risks
Prevalence
All cases at a point/interval
of time
Useful for measuring the size
of the problem and planning
40
41. Relationship of Incidence to Prevalence
Prevalence depends on both on incidence rate and duration of
disease
Because prevalence is affected by factors such as migration and
duration, incidence is preferred for studying etiology.
Prevalence = Incidence X Duration
41
43. Attack Rate
Example: Consider the outbreak of cholera in country Y in
March 2016. 490 population with cholera and the population at
risk were 18,600. What is the AR?
Answer: 𝐴𝑅 =
490
18,600
∗ 100% = 2.6%
43
44. Relative Risk (RR) or Risk Ratio
Defined as the ratio of the incidence of disease in the exposed
divided by the corresponding non-exposed group.
44
Exposure Disease Total
Yes No
Yes a b a+b
No c d c+d
Total a+c b+d N
45. Relative Risk (RR) or Risk Ratio
Where, 𝑝1 = 𝑎/(𝑎 + 𝑏) and 𝑝2 = 𝑐/(𝑐 + 𝑑)
A point estimate of the risk ratio is given by:
𝑅𝑅 =
𝑝1
𝑝2
45
46. Relative Risk (RR) or Risk Ratio
Example
46
1st give
Birth
Breast Cancer
Total
Yes No
≥25 years 31 1597 1628
<25 years 65 4475 4540
96 6072 6168
47. Relative Risk (RR) or Risk Ratio
𝑝1 = 𝑎/(𝑎 + 𝑏) = 31/1628 = 0.019
𝑝2 = 𝑏/(𝑏 + 𝑑) = 65/4540 = 0.014
𝑹𝑹 =
𝒑𝟏
𝒑𝟐
=
𝟎. 𝟎𝟏𝟗
𝟎. 𝟎𝟏𝟒
= 𝟏. 𝟑𝟓𝟕
Women who give first birth at an older age are 35.7% more
likely to develop breast cancer.
47
48. Relative Risk (RR) or Risk Ratio
To obtain a CI for the RR
ln(𝑅𝑅) ± 𝑧1−
𝛼
2
𝑏
𝑎𝑛1
+
𝑑
𝑐𝑛2
Where, 𝑛1 = 𝑎 + 𝑏, 𝑛2 = 𝑐 + 𝑑 and
ln is a natural logarithm.
48
49. The Odds Ratio
The odds ratio (OR) is the odds in favor of disease for the exposed
group divided by the exposed group divided by the odds in the
favor of disease for the unexposed group.
The odds in favor of disease is 𝑝
(1−𝑝), where, p is probability of a
disease.
49
51. The Odds Ratio
The odds ratio is defined as:
𝑶𝑹 =
𝒑𝟏
𝟏 − 𝒑𝟏
𝒑𝟐
𝟏 − 𝒑𝟐
=
𝒑𝟏
𝒒𝟏
𝒑𝟐
𝒒𝟐
Is estimated by:
𝑂𝑅 =
𝒑𝟏
𝒒𝟏
𝒑𝟐
𝒒𝟐
=
𝒂
𝒂+𝒃
/ 𝒃
𝒂+𝒃
𝒄
𝒄+𝒅
/ 𝒅
𝒄+𝒅
= 𝒂𝒅
𝒃𝒄
51
52. The Odds Ratio
Example: in the study of the risk factors for invasive cervical
cancer, the following data were collected (case-control)
52
Smoker Nonsmoker Total
Cancer 108 117 225
No Cancer 163 268 431
Total 271 385 656
53. The Odds Ratio
The odds ratio is estimated by:
𝑂𝑅 =
108 ∗ 268
117 ∗ 163
= 1.52
Women with cancer have an odds of smoking that are 1.52
times the odds of those without cancer.
53
54. The Odds Ratio
A CI can be constructed for OR as:
ln(𝑂𝑅) ± 𝑧1−𝛼
2
1
𝑎
+
1
𝑏
+
1
𝑐
+
1
𝑑
54
55. The Odds Ratio
Exponentiating the upper and lower confidence limits for the
natural log of the OR
𝑒
𝑙𝑛𝑂𝑅−𝑍
1
𝑎+
1
𝑏+
1
𝑐+
1
𝑑, 𝑒
𝑙𝑛𝑂𝑅+𝑍
1
𝑎+
1
𝑏+
1
𝑐+
1
𝑑
55
56. The Odds Ratio
For Cervical Cancer data
Therefore, a 95% CI for ln(OR)
ln(1.52) ± 1.96(0.166)
or
(0.093, 0.744)
56
57. The Odds Ratio
A 95% CI for the OR itself is
𝒆𝟎.𝟎𝟗𝟑
, 𝒆𝟎.𝟕𝟒𝟒
or
(1.10, 2.13)
This interval does not contain the value 1
We conclude that the odds of developing cervical cancer
are significantly higher for smokers than for nonsmokers
57
58. Quiz (5%)
Consider the total 22,071 people under study;
where 11,037 were assigned to the Aspirin user
group and the rest were assigned to a placebo
group. If 104 people among the Aspirin users have
a Myocardial Infarction case and in total, there are
293 Myocardial Infarction cases, find the Odds
Ratio and Interpret the result.
58
59. Bias
Describes error arise from the design or execution of
the study.
It’s undesirable
It can’t be adjusted
Useful to consider in any study
Essential to consider in critical appraisal
59
60. Bias
It’s a systematic error introduced to the study
design.
Two major forms
Selection Bias: refers to any error that arises in the
process of identifying the study subjects.
Information Bias: includes any systematic error in
the measurements on either exposure or outcome
variable.
60
61. Selection Bias
Selection bias occurs when identification of subjects for
inclusion into a study depends on the interest of the data
collector or investigator.
If selection of cases and controls (eg in case control
study) is based on different criteria, then bias can occur.
There are lots of circumstances selection bias to occur,
but there are two major known forms.
61
62. Types of Selection Bias
Response Bias:
Those who agree to be in a study may be in some way different
from those who refuse to participate.
Volunteers may be different from those who are listed.
Berksonian Bias:
Bias that is introduced due to differences in criteria/probabilities of
admission to the hospital for those with the disease and those
without the disease.
Admission criteria of the hospital
62
63. Information Bias
In analytical studies usually one factor is known and another is
measured.
E.g. in case control studies, the “outcome” is known and the
“exposure” is measured.
E.g. in cohort studies, the exposure is known and the outcome is
measured.
63
64. Information Bias
Error in the measurements/information obtained in the study could
be:
Error due to participants
Error due to “observers”
Differential (Non-random)
Non-differential (Random)
• (i.e. is it influencing equally on the exposure and the outcome?)
64
65. Types of Information Bias
1. Interviewer Bias: an interviewer’s knowledge of the exposure and
outcome may influence the structure of questions and the manner
of presentation which may influence the response.
2. Recall Bias: those with a particular outcome or exposure may
remember events more clearly or amplify their memories.
3. Observer Bias: Observers may have preconceived expectations of
what they should find in an examination.
4. Lose to follow-up: those who are lost to follow-up or who
withdraw from the study may be different from those who are
followed for the entire study.
65
66. Types of Information Bias
5. Hawthorne effect: an effect first documented at the Hawthorne
manufacturing plant; people act differently if they know they are
being watched.
6. Surveillance Bias: The group with the known exposure or outcome
may be followed more closely or longer than the comparison
group.
7. Misclassification Bias: Errors are made in classifying either the
disease or exposure status.
66
67. Confounding Variable
The word came from Latin, “confundere” meaning “to
mix up”.
The measured effect of an exposure is distorted because
of the association of the exposure with another factor
(confounder) that influences the outcome.
67
Exposure Outcome
Confounder
68. Confounding
A problem resulting from the fact that one feature of study
subjects has not been separated from the second feature and has
thus been confounded with it producing a spurious result.
The spuriousness arises from the effect of the effect of the first
feature being mistakenly attributed to the second feature.
Confounding can produce either a type I or a type II error, but we
usually focus on the type I errors.
68
69. Confounding
At the simplest level, confounding can be thought of as a
confusion of effects.
The apparent effect of the exposure of interest is distorted
because the effect of an extraneous third factor is mixed
with the actual effect.
69
70. Difference from Bias…
Bias creates an association that is not true; however, confounding
describes an association that is true, but potentially misleading.
Key principle of confounding include that a confounder should be
associated with both the independent and dependent variables
(i.e. with the exposure and the disease)
Association of the confounder with just one of the two variables
is not enough to produce spurious result.
70
71. Effect of a confounder
Could be large
May produce an over or underestimate of the true effect
May change the apparent direction of the effect
71
72. Controls for confounding
Controls for confounding may be built into the design or analysis
stages of the study
Design stage
Randomization
Restriction
Matching (on the basis of the potential confounding variables;
especially, age and gender)
Cases and controls can be individually matched for one or more
variables, or they can be group matched.
Matching is more expensive and requires specific analytic
techniques
72
74. Matching
One approach to deal with potential confounders is by matching.
Matching: is a statistical technique that is used to evaluate the effect of a
treatment by comparing the treated and the non-treated units in a study. It is a
technique that selects subjects so that the distribution of potential
confounders is similar in both groups.
For example, if we are assessing the effect of opium on total mortality and
sex is a potential confounder, one can match a male opium user to a male
opium non-user and a female opium user to a female opium non-user. This
way users and non-users will be exactly the same for sex, and thus sex could
not confound the association.
By extension, one can match for more than one variable, such as by age and
sex.
For example, a 56-year-old male opium user can be matched to a 56-year-
old male non-user.
74