Jennifer Hamilton is a Dr. Who fan and an applied statistician who loves to talk about evaluation methodology. She is fun at parties. Dr. Hamilton has been working in the field of evaluation since 1996, and has conducted evaluations with federal government programs; state, county, and city agencies and organizations; and foundations and nonprofits.
You can email Jennifer Hamilton at firstname.lastname@example.org
My company (the Rockville Institute) was hired to conduct a 5 year evaluation of a national program that supports schools in high-poverty neighborhoods improve the health of students and staff. Schools monitor their progress using the Center for Disease Control and Prevention’s (CDC) School Health Index.
The program had recently developed an on-line model of support, to supplement their traditional on-site support model, but wasn’t sure if they should take it to scale. They wanted to base their decision on evaluation results. Therefore, we proposed a rigorous randomized study comparing these two different types of support.
The problem was, two years in, the program’s revenue was shrinking and they had to start using the on-line support program, because it was more cost effective. They could not wait for the results of the evaluation to make their decision. In short, the program did not need us anymore.
We knew their decision was made, but we hoped that the study results could still be useful to other programs. We needed to make some changes so that it would be relevant to a broader audience. We had two groups – less and more intensive support. If we could expand this, by adding a no-support arm, and an even more intensive arm, the results could be relevant to all kinds of programs. So we developed a continuum of support intensity, from no support (new arm), low support (on-line model), moderate support (on-site support) and a new high intensity model of on-site support (new arm).
But where were we going to find these extra schools?
We knew that schools implementing the program were only a small portion of the universe of schools completing the CDC instrument. The CDC could therefore provide outcome data for matched schools not participating in the program.
We also knew that another study of the program was being conducted and was using the same outcome as us. The support was provided in person by program manages with lower caseloads and more time onsite than the moderate support group(M2) from the original design. The other research group could provide outcome data for matched schools receiving a more intensive version of support
But What About the Methodology?
The question is how to add these new groups while retaining the rigor of the original design. While our original schools were randomized into groups, the new schools can only be matched to the randomized pairs. So we are mixing a quasi-experimental design (QED) into an RCT. What does this mean, practically speaking? Well, we have to think about all the possible comparisons.
The original L1/M2 comparison is unchanged and maintains the highest level of internal reliability – because both groups of schools were randomly assigned. All of the other possible contrasts are still internally reliable, although to a slightly lesser the extent – because they now involved matched schools instead of randomly assigned schools.
Implications for Evaluators
This study illustrates a common danger of longitudinal designs - they just takes too long in the policy world and programs are typically in flux. But the funder supported efforts to expand the focus beyond the specific program to one that would have broader applicability. This resulted in a hybrid design that still maintained sufficient rigor to respond to broad policy questions. Flexibility in the evaluation can still save an RCT, and this mixed QED-RCT design can help!