How evaluators adapt their evaluations to the developmental context: Evaluation design

What evaluation design is best? This debate has raged through the field of evaluation on what constitutes credible evidence[1] with some arguing for RCTs as the “gold standard” and others questioning the superiority of the RCT.

This debate is somewhat meaningless when we understand that the evaluation design is chosen based on the evaluation questions. Evaluations seeking outcomes or impact are perhaps best served by an experimental (i.e., RCT) or quasi-experimental design whereas evaluations seeking the needs of the program and fidelity of implementation are better served by a descriptive (e.g., case study, observational) or correlation (e.g., cohort study, cross-sectional study) design.

In the context of youth programs, however, longitudinal designs may be particularly important. Longitudinal designs are critical for measuring and understanding development over time. They’re especially critical when knowledge of long-term effects, that may not manifest until the end of the program or after services have ended, is needed.

In my study, evaluators did not change their evaluation designs based on the age of participants. I asked evaluators to rank their choice of evaluation design and majority chose quasi-experimental (37%), descriptive/correlation (23%), or experimental (15%) as their primary choice. Few evaluators chose a case study (8%) or ethnographic (4%) design. A further 13% evaluators chose to write in another design, with majority indicating a mixed methods design.

I also asked evaluators how many waves of survey or interview data collection they would do across the three years of the evaluation. For those who responded to survey questions, 69% said they would do a baseline and multiple follow-up surveys, 28% said they would do a baseline and one follow-up, and only 3% said they would only do a baseline or post-test survey. For those who responded to interview questions, 93% said they would do multiple sets of interviews or focus groups and only 7% said they would do only one set. However, there are likely no differences because of the length of the simulation study’s evaluation of three years.

Be sure to check out the previous post on how the evaluation approach differed across age conditions. Stay tuned for more results from my study in terms of the evaluation methods, as well as a discussion explaining these results and next steps!

[1]  Donaldson, S. I., Christie, C. A., & Mark, M. M. (Eds.) (2009). What counts as credible evidence in applied research and evaluation practice? Los Angeles, CA: Sage.

Leave a Reply

Your email address will not be published. Required fields are marked *