Types of Evaluation Designs

Three broad types of evaluation designs, randomized experiments, quasi-experiments and non-experiments, address what would have happened in the absence of the health program (the “counterfactual”) in different ways. Addressing the counterfactual is a requirement for demonstrating the health program caused changes in outcomes or impacts.

Randomized experiments, also called experimental design, are the most rigorous evaluation design, often referred to as the “gold standard.”

Pre-Test/Post-Test with Random Assignment to Intervention or Comparison Groups. In randomized experiments, study subjects (or groups) are randomly assigned to a group that receives the health program intervention (study or treatment group) or a comparison group that does not receive the intervention (control or non-treatment group). Data for each group are collected before and after the intervention. At the end of the experiment, differences between the intervention and comparison groups can be attributed directly to the effect of the intervention—if the sample is large enough. Notably, post-test only designs can also be used for experimental designs, assuming that the groups are randomly assigned before the intervention began.

Randomization ensures that the intervention and comparison groups are equivalent with respect to all factors other than whether they received the intervention. In other words, the comparison group serves as the “counterfactual” of what would have happened in the absence of the program—a key requirement in determining whether a program caused a particular health outcome.

Although considered the “gold standard,” randomized experiments often are not feasible in real-world scenarios.

Practical difficulties arise in randomly assigning subjects to the intervention and comparison groups, and it may be unethical to offer the intervention to one group but not to another group.
Spillover effects can result in the comparison group being exposed to the intervention.
High rates of dropouts in the intervention or comparison groups can bias the results.
Randomized studies are often expensive to implement, which may limit the feasibility of this design for many health programs.
When randomization of subjects or groups is neither practical nor feasible, a quasi-experimental design can approximate the randomized experiment. Quasi-experimental designs use an intervention and comparison group, but assignment to the groups is nonrandom.

Pre-Test/Post-Test with Non-Random Assignment to Intervention or Control Groups. As with randomized experiments, for a pre-test/post-test quasi-experimental design, data are collected before and after the intervention. However, assigning subjects to the intervention and comparison groups is non-random. Thus, evaluators cannot assume equivalence between the two groups. Instead, they must assess the differences at baseline and account for any demographic or behavioral differences in the analysis.

Comparison groups in the quasi-experimental design can be identified through matching—a process of identifying individuals that are similar to the participants in the intervention group on all relevant characteristics, such as age, sex, religion and other factors associated with program exposure.

Post-Test Only with Non-Random Assignment. In some cases, data are not collected before the intervention. Instead, data are collected only after the program has ended among participants who had received the intervention and among non-participants, making for a weaker design. Matching participants and non-participants with similar characteristics and accounting for any relevant differences are especially important in the post-test only design to isolate effects of the intervention.

The non-experimental design is an intervention group only and lacks a comparison/control group, making it the weakest study design. Without a comparison group, it is difficult for evaluators to determine what would have happened in the absence of the intervention. Evaluators choose to use non-experimental designs when there are resource constraints, when they are unable to form an appropriate comparison group, or when a program covers the entire population and thus there is no comparison group, such as with a mass media campaign.

In non-experimental study designs, evaluators must have a clear conceptual understanding of how the intervention was intended to influence the health outcomes of interest. Thus, the program team needs to develop a robust framework during the program planning phase.

There are four commonly used types of non-experimental designs:

In pre-test/post-test designs, evaluators survey the intervention group before and after the intervention. While evaluators may observe changes in outcome indicators among the intervention participants, they cannot attribute all these changes to the intervention alone using this design because there is no comparison group.
Time-series designs look for changes over time to determine trends. Evaluators observe the intervention group multiple times before and after the intervention and analyze trends before and after.
The longitudinal study is another type of time-series design. Evaluators take repeated measures of the same variables from the same people. A panel design is a special type of longitudinal design in which evaluators track a smaller group of people at multiple points in time and record their experiences in great detail.
In a post-test only design, evaluators observe the intervention group at one point in time after the intervention, focusing particularly on comparing responses of sub-groups based on such characteristics as age, sex, ethnicity, education or level of exposure to the intervention. This is the weakest approach.
Ways to strengthen the non-experimental design:

Measure participants’ level of exposure to the program. If people with greater exposure to the program showed greater change in the outcomes, it strengthens the argument that the program led to changes. However, because the non-experimental design lacks a comparison group, changes in outcomes could still be due to selection bias—that is, the changes could reflect differences in participants who were exposed to the program compared with people who were not exposed to the program.
Collect data from the same participants over time using a panel or longitudinal design. These individuals serve as their own controls—characteristics of an individual observed earlier can be controlled for when analyzing changes in the outcomes.