Principles and Methods of Epidemiologic Study

College of Computing and Informatics
Department of Statistics
Biostatistics and Epidemiology
Chapter 1: Principles and Methods of Epidemiology
By: Dugo G. (MSc.)
Email: dugojgadisa@gmail.com or
Dugo.Gadisa@haramaya.edu.et
1

Introduction to Epidemiology and Biostatistics
What is the difference between the two?
Biostatistics is the application of statistical methods
in biology, medicine, public health, and other fields
of study.
Epidemiology is the study of patterns of health and
illness and associated factors at the population level
(disease distribution, prevalence, mechanisms of
prevention, etc.)
2

Introduction to Epidemiology
Definitions of Epidemiology
It is a study of the distribution of a disease or a
physiological condition in human populations and the
factors that influence this distribution.
It is a study of the distribution and determinants of health-
related states and events in populations and the application
of this study to control health problems.
3

What is Epidemiology?
In general, Epidemiology can be
defined as the study of determinants,
distribution, and frequency of
disease.
4

Uses of Epidemiology
a. Community diagnosis; i.e., what are the major health problems
occurring in a community
b. Establishing the history of a disease in a population; e.g.,
identifying the periodicity of an infectious disease
c. Describing the natural history of disease in the individual; e.g.,
natural history of Cancer in the individual (from its pathological
onset (inception) to resolution (recovery or death), clinical stages)
5

d. Describing the clinical picture of the disease; i.e., who gets the
disease, who dies from the disease, and what the outcome of the
disease is
e. Estimating risk; e.g., what factors increase the risk of heart
disease, automobile accidents, and violence
f. Identifying syndromes and precursors; e.g., the relationship of
high blood pressure to stroke, kidney disease, and heart disease
Syndromes: a group of signs and symptoms that consistently occur
together and characterize a particular abnormality or condition.
Precursors: a substance, cell, or cellular component from which
another substance, cell, or cellular component is formed
6

g. Evaluating prevention/intervention programs; e.g., vaccine
and clinical trials
h. Investigating epidemics/diseases of unknown etiology.
Etiology encompasses understanding why a particular
condition or disease occurs.
7

Some Epidemiologic Concepts
Catchment area:
The geographical area from which the people attending a
particular health facility come.
Catchment population :
People attending particular health facilities
Population at risk: is vital to know all people at risk of
developing a disease or having a health problem, as well
as those who are currently suffering from it.
8

Incidence: the number of new cases, or events
occurring over a defined period of time, commonly one
year.
Prevalence: the total number of existing cases, episodes or
events occurring at one point in time, commonly on a
particular day.
9

10

Case: A person who is identified as having a particular characteristic
such as a disease, behavior, or condition. Cases may be divided into
possible, probable, and definite, depending on how well specific
criteria are satisfied
Controls: refer to a specific group of individuals who serve as a
comparison for the group of people with a particular disease (known
as cases).
11

Epidemic: the occurrence in a community or region of cases of an
illness or other similar event clearly in excess of what is normally
expected. The characteristics of the illness, the area and the season
all have to be taken into account.
Epidemic incidence curve: a graph that plots cases of the disease
by the time of onset of the illness. An essential part of the analysis
is this graph can indicate the nature of the outbreak and the
probable source.
12

Mortality Rate
Crude Death Rate (CDR) =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑑𝑢𝑟𝑖𝑜𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑚𝑖𝑑 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
∗ 1000
The crude death rate was 5.995 per 1,000 in Ethiopia in 2023.
The crude death rate in Addis Ababa was approximately 6.29 per
1000 (data of 2020).
13

Age-specific Mortality Rate
Age-specific mortality rate =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑖𝑛 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑎𝑔𝑒 𝑔𝑟𝑜𝑢𝑝 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑚𝑖𝑑−𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑎𝑔𝑒 𝑔𝑟𝑜𝑢𝑝
∗ 1000
One example of age specific mortality rate is Infant
Mortality Rate.
14

Sex-Specific Mortality Rate
Sex-specific mortality rate =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑖𝑛 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑠𝑒𝑥 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑚𝑖𝑑−𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑎𝑚𝑒 𝑠𝑒𝑥
∗ 1000
Example: The average total population of “Town A” in 2019 was
6000 (3500 female & 2500 male). In the same year, 300 people died
(100 female and 200 male). Calculate the Crude death rate and
mortality rate for females.
15

Case Fatality Rate
Case Fatality Rate (CFR) =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑓𝑟𝑜𝑚 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝑁𝑜 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠 𝑜𝑓 𝑡ℎ𝑎𝑡 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑡𝑖𝑚𝑒
∗ 100
Example: In 1996, there were 1000 tuberculosis patients in
one region. Out of the 1000 patients, 100 died in the same
year. Calculate the case fatality rate of tuberculosis.
16

Neonatal Mortality Rate
Neonatal Mortality Rate =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑢𝑛𝑑𝑒𝑟 28 𝑑𝑎𝑦𝑠 𝑜𝑓 𝑎𝑔𝑒 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝑁𝑜 𝑜𝑓 𝑙𝑖𝑣𝑒 𝑏𝑖𝑟𝑡ℎ𝑠 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑡𝑖𝑚𝑒
∗ 1000
Example: In 2010, there were a total of 5000 live births in “Zone B”.
Two hundred of them died before 28 days after birth. Calculate the
Neonatal Mortality Rate (NMR).
Answer: NMR =
200
500
∗ 1000 = 40 𝑝𝑒𝑟 1000 𝑙𝑖𝑣𝑒 𝑏𝑖𝑟𝑡ℎ𝑠
That means out of 1000 live births in 2010, 40 of them died before 28
days after birth.
17

Infant Mortality Rate (IMR)
Infant Mortality Rate (IMR) =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑜𝑓 𝑢𝑛𝑑𝑒𝑟 1 𝑦𝑒𝑎𝑟 𝑜𝑓 𝑎𝑔𝑒 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝑁𝑜 𝑜𝑓 𝑙𝑖𝑣𝑒 𝑏𝑖𝑟𝑡ℎ𝑠 𝑟𝑒𝑝𝑜𝑟𝑡𝑒𝑑 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑡𝑖𝑚𝑒
∗ 1000
18

Under-Five Mortality Rate
Under-five mortality rate =
𝑁𝑜 𝑜𝑓 𝑑𝑒𝑎𝑡ℎ𝑠 𝑜𝑓 0−4 𝑦𝑒𝑎𝑟𝑠 𝑜𝑓 𝑎𝑔𝑒 𝑑𝑢𝑟𝑖𝑛𝑔 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑚𝑖𝑑−𝑦𝑒𝑎𝑟 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑎𝑔𝑒 𝑎𝑡 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑡𝑖𝑚𝑒
∗ 1000
 NB: The numerator says 0-4 years. 0-4 years in this formula means children from
birth to less than five years of age i.e. the upper age limit is not 4.
 Example: In 1996, the total number of children under 5 years of age was 10,000 in
“Zone C”. In the same year, 200 children under five years of age died. Calculate the
under-five mortality rate (U5MR).
 U5MR =
200
10000
∗ 1000 = 20 𝑝𝑒𝑟 1000 𝑢𝑛𝑑𝑒𝑟 𝑓𝑖𝑣𝑒 𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛
19

Maternal Mortality Rate
 Maternal Mortality Rate =
𝑁𝑜 𝑜𝑓 𝑝𝑟𝑒𝑔𝑛𝑎𝑛𝑐𝑦 𝑎𝑠𝑠𝑜𝑐𝑖𝑎𝑡𝑒𝑑 𝑑𝑒𝑎𝑡ℎ𝑠 𝑜𝑓 𝑚𝑜𝑡ℎ𝑒𝑟𝑠 𝑖𝑛 𝑎 𝑔𝑖𝑣𝑒𝑛 𝑡𝑖𝑚𝑒
𝑁𝑜 𝑜𝑓 𝑙𝑖𝑣𝑒 𝑏𝑖𝑟𝑡ℎ𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑒 𝑡𝑖𝑚𝑒
∗ 1000
 Maternal Mortality Rate reflects the standards of all aspects of maternal
care (antenatal, delivery, and postnatal).
 The Maternal Mortality Rate in Ethiopia is estimated to be between 267 to
412 per 100,000 live births in 2020.
 That means in 100,000 live births, on average around 340 mothers die
each year due to pregnancy-related causes.
20

Scope of Epidemiology
Its scope at the beginning was limited to understanding epidemics
Now it is the basis of advancing our understanding of all kinds of
diseases included:
Nutritional deficiencies
Infectious and non-infectious diseases
 Injuries and accidents
Mental disorders
Maternal and Child Health
Cancer
Occupational Health
Environmental Health
Health behaviors
21

Measuring Disease Frequency
It Has several Components
Classifying and categorizing disease
Defining the period of time of risk of disease
Deciding what constitutes a case of disease in a study
Obtaining permission to study people
Making measurements of disease frequency
Finding a source for ascertaining the cases
Relating cases to population and time at risk
Defining the population at risk of disease
22

The Basic Triad of Descriptive Epidemiology
The three essential characteristics of disease we look for in
descriptive epidemiology:
TIME
PLACE
PERSON
23

Time
There are three major kinds of changes in disease occurrence over
time.
1. SECULAR TRENDS (slowly change): This refers to gradual
changes over long period of time, such as years or decades.
E.g. AIDS, cancer.
2. PERIODIC OR CYCLIC CHANGES. This refers to recurrent
alterations in the frequency of diseases.
Cycles may be annual or have some other periodicity. E.g. measles,
malaria, meningitis.
3. SPORADIC: irregular and unpredictable intervals.
E.g. influenza, Allergies
24

Time (Cont’d…)
Secular trend can be due to one or more of the following factors.
1. Change in diagnostic technique
2. Change in accuracy of enumerating population at risk.
3. Change in age distribution of the people.
4. Change of survival from disease.
5. Change in actual incidence of the disease.
25

Time (Cont’d…)
Changing or Stable
Seasonal Variation
Clustered (Epidemic) or evenly distributed (Endemic)
Point source or Propagated
26

Place
The frequency of disease is different in different places.
Natural barriers: environmental or climatic conditions, such as
temperature, humidity, rainfall, altitude, mineral content of soil, or
water supply
Political boundaries: Intended for planning and allocation of
resources
Urban-rural differences in disease occurrence: in terms of
migration, style of living and differential environmental exposures
also helpful
27

Place (Cont’d…)
Geographically restricted or widespread (Pandemic)
Relation to food or water supply
Multiple clusters or one
28

Person
Age
Socio-economic status
Gender
Ethnicity
Behavior
29

Person (Who)
Young vs Old
Female vs male
Rich vs Poor
Illiterate Vs educate
Place (Where)
Lowland vs Highland
Urban vs Rural
Time (When)
Day/night variation
Seasonal variation
Long term
30

Disease Occurrence
Dynamics of Disease Transmission
Interaction of agents and environmental factors with human
hosts
Distribution of severity of diseases
Modes of disease transmission
Level of disease in community when transmission stops
31

The Basic Triad of Analytic Epidemiology
The three phenomena assessed in Analytic Epidemiology
are:
Host
Agent Environment
32

The Basic Triad of Analytic Epidemiology
Host: In epidemiology, the host is usually a human who gets sick
but can also be an animal that acts as a carrier of disease but may or
may not present illness.
Agent: Epidemiologic triangle agents include Bacteria, Viruses,
Fungi, Protozoa, et cetera.
Environment: The environment represents the favorable
conditions for an agent to cause a health event. Environmental
factors include physical features like geology or climate, biological
factors like the presence of disease-transmitting insects, and
socioeconomic factors like crowding, sanitation, and access to
health services.
33

Measuring Disease Frequency
Incidence
Prevalence
Defined time period
Population at risk
34

When Calculating DF(disease frequency)
The numerator (number of cases/episodes)
The denominator (total population at risk)
Factor (e.g. 100, 1000, 10000)
Time period (dates, weeks, months, or years)
We use the following to determine DF.
Use rates: incidence rates
Prevalence rates
35

Incidence and Prevalence rates
Decide what are you counting.
Episodes/cases, people, attendance or what?
What is the service count when filling monthly statistics eg.
diarrhea or malaria
people get repeated attacks in one month and attend your service
This is one person sick but has suffered several times separate
episodes in one year and attended your service several times
36

Incidence and Prevalence rates
Incidence: Count episodes/cases
Prevalence: chronic conditions/diseases which count the total
number of sick people.
To study the use of health services, informations on new attendance
and repeat attendance are required.
37

When Calculating:
Incidence rate:
𝐼𝑅 =
𝑁𝑜 𝑜𝑓 𝑛𝑒𝑤 𝑐𝑎𝑠𝑒𝑠 𝑜𝑓 𝑎 𝑑𝑖𝑠𝑒𝑎𝑠𝑒 𝑜𝑣𝑒𝑟 𝑎 𝑝𝑒𝑟𝑖𝑜𝑑 𝑜𝑓 𝑡𝑖𝑚𝑒
𝑇𝑜𝑡𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑑𝑢𝑟𝑖𝑛𝑔 𝑡ℎ𝑒 𝑔𝑖𝑣𝑒𝑛 𝑝𝑒𝑟𝑖𝑜𝑑 𝑜𝑓 𝑡𝑖𝑚𝑒
∗ 𝐾
Prevalence rate:
𝑃𝑅 =
𝐴𝑙𝑙 𝑝𝑒𝑟𝑠𝑜𝑛𝑠 𝑤𝑖𝑡ℎ 𝑎 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑠 𝑎𝑡 𝑜𝑛𝑒 𝑝𝑜𝑖𝑛𝑡 𝑖𝑛 𝑡𝑖𝑚𝑒
𝑇𝑜𝑡𝑎𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
∗ 𝐾
38

Example 1: In September 1995 there were 200 new cases of
relapsing fever in “Kebele X”. The average total population of
“Kebele X” was 4000. Calculate the incidence rate of relapsing fever
in “Kebele X” in September 1995. Answer: 50 new cases per 1000
Example 2: 5,600,000 people in South Africa were estimated to be
infected with HIV in 2009 with a total population of 53 million. What
is the prevalence of HIV in the South? Answer:
39

Comparing Incidence and Prevalence
Incidence
New cases or events over a
period of time
Useful to study factors
causing risks
Prevalence
All cases at a point/interval
of time
Useful for measuring the size
of the problem and planning
40

Relationship of Incidence to Prevalence
Prevalence depends on both on incidence rate and duration of
disease
Because prevalence is affected by factors such as migration and
duration, incidence is preferred for studying etiology.
Prevalence = Incidence X Duration
41

Relationship between Incidence, Prevalence and Disease
Duration
42
Incidence
Prevalence Death
Lost to follow up
Cure

Attack Rate
Example: Consider the outbreak of cholera in country Y in
March 2016. 490 population with cholera and the population at
risk were 18,600. What is the AR?
Answer: 𝐴𝑅 =
490
18,600
∗ 100% = 2.6%
43

Relative Risk (RR) or Risk Ratio
Defined as the ratio of the incidence of disease in the exposed
divided by the corresponding non-exposed group.
44
Exposure Disease Total
Yes No
Yes a b a+b
No c d c+d
Total a+c b+d N

Where, 𝑝1 = 𝑎/(𝑎 + 𝑏) and 𝑝2 = 𝑐/(𝑐 + 𝑑)
A point estimate of the risk ratio is given by:
 𝑅𝑅 =
𝑝1
𝑝2
45

Example
46
1st give
Birth
Breast Cancer
Total
Yes No
≥25 years 31 1597 1628
<25 years 65 4475 4540
96 6072 6168

𝑝1 = 𝑎/(𝑎 + 𝑏) = 31/1628 = 0.019
𝑝2 = 𝑏/(𝑏 + 𝑑) = 65/4540 = 0.014
𝑹𝑹 =
𝒑𝟏
𝒑𝟐
=
𝟎. 𝟎𝟏𝟗
𝟎. 𝟎𝟏𝟒
= 𝟏. 𝟑𝟓𝟕
Women who give first birth at an older age are 35.7% more
likely to develop breast cancer.
47

To obtain a CI for the RR
ln(𝑅𝑅) ± 𝑧1−
𝛼
2
𝑏
𝑎𝑛1
+
𝑑
𝑐𝑛2
Where, 𝑛1 = 𝑎 + 𝑏, 𝑛2 = 𝑐 + 𝑑 and
ln is a natural logarithm.
48

The Odds Ratio
The odds ratio (OR) is the odds in favor of disease for the exposed
group divided by the exposed group divided by the odds in the
favor of disease for the unexposed group.
The odds in favor of disease is 𝑝
(1−𝑝), where, p is probability of a
disease.
49

The Odds Ratio
𝑂𝑑𝑑𝑠 = Pr(𝑒𝑣𝑒𝑛𝑡 𝑜𝑐𝑐𝑢𝑟𝑠)
Pr(𝑒𝑣𝑒𝑛𝑡 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑜𝑐𝑐𝑢𝑟) = 𝑝
1−𝑝
𝑶𝑹 =
𝒑𝟏
𝟏 − 𝒑𝟏
𝒑𝟐
𝟏 − 𝒑𝟐
50

The Odds Ratio
The odds ratio is defined as:
𝑶𝑹 =
𝒑𝟏
𝟏 − 𝒑𝟏
𝒑𝟐
𝟏 − 𝒑𝟐
=
𝒑𝟏
𝒒𝟏
𝒑𝟐
𝒒𝟐
Is estimated by:
𝑂𝑅 =
𝒑𝟏
𝒒𝟏
𝒑𝟐
𝒒𝟐
=
𝒂
𝒂+𝒃
/ 𝒃
𝒂+𝒃
𝒄
𝒄+𝒅
/ 𝒅
𝒄+𝒅
= 𝒂𝒅
𝒃𝒄
51

The Odds Ratio
Example: in the study of the risk factors for invasive cervical
cancer, the following data were collected (case-control)
52
Smoker Nonsmoker Total
Cancer 108 117 225
No Cancer 163 268 431
Total 271 385 656

The Odds Ratio
The odds ratio is estimated by:
𝑂𝑅 =
108 ∗ 268
117 ∗ 163
= 1.52
Women with cancer have an odds of smoking that are 1.52
times the odds of those without cancer.
53

The Odds Ratio
A CI can be constructed for OR as:
ln(𝑂𝑅) ± 𝑧1−𝛼
2
1
𝑎
+
1
𝑏
+
1
𝑐
+
1
𝑑
54

The Odds Ratio
Exponentiating the upper and lower confidence limits for the
natural log of the OR
𝑒
𝑙𝑛𝑂𝑅−𝑍
1
𝑎+
1
𝑏+
1
𝑐+
1
𝑑, 𝑒
𝑙𝑛𝑂𝑅+𝑍
1
𝑎+
1
𝑏+
1
𝑐+
1
𝑑
55

The Odds Ratio
For Cervical Cancer data
 Therefore, a 95% CI for ln(OR)
ln(1.52) ± 1.96(0.166)
or
(0.093, 0.744)
56

The Odds Ratio
 A 95% CI for the OR itself is
𝒆𝟎.𝟎𝟗𝟑
, 𝒆𝟎.𝟕𝟒𝟒
or
(1.10, 2.13)
 This interval does not contain the value 1
 We conclude that the odds of developing cervical cancer
are significantly higher for smokers than for nonsmokers
57

Quiz (5%)
Consider the total 22,071 people under study;
where 11,037 were assigned to the Aspirin user
group and the rest were assigned to a placebo
group. If 104 people among the Aspirin users have
a Myocardial Infarction case and in total, there are
293 Myocardial Infarction cases, find the Odds
Ratio and Interpret the result.
58

Bias
Describes error arise from the design or execution of
the study.
It’s undesirable
It can’t be adjusted
Useful to consider in any study
Essential to consider in critical appraisal
59

Bias
It’s a systematic error introduced to the study
design.
Two major forms
Selection Bias: refers to any error that arises in the
process of identifying the study subjects.
Information Bias: includes any systematic error in
the measurements on either exposure or outcome
variable.
60

Selection Bias
Selection bias occurs when identification of subjects for
inclusion into a study depends on the interest of the data
collector or investigator.
If selection of cases and controls (eg in case control
study) is based on different criteria, then bias can occur.
There are lots of circumstances selection bias to occur,
but there are two major known forms.
61

Types of Selection Bias
Response Bias:
Those who agree to be in a study may be in some way different
from those who refuse to participate.
Volunteers may be different from those who are listed.
Berksonian Bias:
Bias that is introduced due to differences in criteria/probabilities of
admission to the hospital for those with the disease and those
without the disease.
Admission criteria of the hospital
62

Information Bias
In analytical studies usually one factor is known and another is
measured.
E.g. in case control studies, the “outcome” is known and the
“exposure” is measured.
E.g. in cohort studies, the exposure is known and the outcome is
measured.
63

Information Bias
Error in the measurements/information obtained in the study could
be:
 Error due to participants
 Error due to “observers”
 Differential (Non-random)
 Non-differential (Random)
• (i.e. is it influencing equally on the exposure and the outcome?)
64

Types of Information Bias
1. Interviewer Bias: an interviewer’s knowledge of the exposure and
outcome may influence the structure of questions and the manner
of presentation which may influence the response.
2. Recall Bias: those with a particular outcome or exposure may
remember events more clearly or amplify their memories.
3. Observer Bias: Observers may have preconceived expectations of
what they should find in an examination.
4. Lose to follow-up: those who are lost to follow-up or who
withdraw from the study may be different from those who are
followed for the entire study.
65

Types of Information Bias
5. Hawthorne effect: an effect first documented at the Hawthorne
manufacturing plant; people act differently if they know they are
being watched.
6. Surveillance Bias: The group with the known exposure or outcome
may be followed more closely or longer than the comparison
group.
7. Misclassification Bias: Errors are made in classifying either the
disease or exposure status.
66

Confounding Variable
The word came from Latin, “confundere” meaning “to
mix up”.
The measured effect of an exposure is distorted because
of the association of the exposure with another factor
(confounder) that influences the outcome.
67
Exposure Outcome
Confounder

Confounding
A problem resulting from the fact that one feature of study
subjects has not been separated from the second feature and has
thus been confounded with it producing a spurious result.
The spuriousness arises from the effect of the effect of the first
feature being mistakenly attributed to the second feature.
Confounding can produce either a type I or a type II error, but we
usually focus on the type I errors.
68

Confounding
At the simplest level, confounding can be thought of as a
confusion of effects.
The apparent effect of the exposure of interest is distorted
because the effect of an extraneous third factor is mixed
with the actual effect.
69

Difference from Bias…
Bias creates an association that is not true; however, confounding
describes an association that is true, but potentially misleading.
Key principle of confounding include that a confounder should be
associated with both the independent and dependent variables
(i.e. with the exposure and the disease)
Association of the confounder with just one of the two variables
is not enough to produce spurious result.
70

Effect of a confounder
Could be large
May produce an over or underestimate of the true effect
May change the apparent direction of the effect
71

Controls for confounding
Controls for confounding may be built into the design or analysis
stages of the study
Design stage
Randomization
Restriction
Matching (on the basis of the potential confounding variables;
especially, age and gender)
Cases and controls can be individually matched for one or more
variables, or they can be group matched.
Matching is more expensive and requires specific analytic
techniques
72

Control Confounding: Analysis Stage
Stratification
Multivariate Analysis: Multiple Linear Regression
73

Matching
 One approach to deal with potential confounders is by matching.
 Matching: is a statistical technique that is used to evaluate the effect of a
treatment by comparing the treated and the non-treated units in a study. It is a
technique that selects subjects so that the distribution of potential
confounders is similar in both groups.
 For example, if we are assessing the effect of opium on total mortality and
sex is a potential confounder, one can match a male opium user to a male
opium non-user and a female opium user to a female opium non-user. This
way users and non-users will be exactly the same for sex, and thus sex could
not confound the association.
 By extension, one can match for more than one variable, such as by age and
sex.
 For example, a 56-year-old male opium user can be matched to a 56-year-
old male non-user.
74

Principles and Methods of Epidemiologic Study

Recommended

Recommended

More Related Content

Similar to Principles and Methods of Epidemiologic Study

Similar to Principles and Methods of Epidemiologic Study (20)

Recently uploaded

Recently uploaded (20)

Principles and Methods of Epidemiologic Study