How evaluators adapt their evaluations to the developmental context: Evaluation approach

As mentioned previously, developmentally appropriate evaluation requires a culturally appropriate evaluation in the context of youth programs. This means including youth, or at minimum those with knowledge and experience working with youth, in the evaluation.

In my study, I asked practicing evaluators to describe the levels of involvement there would be across a wide range of stakeholders including school administrators, teachers, parents, program staff, program designers, district personnel, funders, developmental consultants, math consultants, and the tutors and tutees of the program. In particular, I was interested in the levels of involvement of the consultants, the tutors, and the tutees across the evaluators randomly assigned to the child, adolescent, or adult conditions. Continue reading “How evaluators adapt their evaluations to the developmental context: Evaluation approach”

Comments Requested: College Access Journal Publication

Together with Dr. Nazanin Zargarpour, we were accepted to present the attached paper for the American Education Research Association’s (AERA) 2017 conference. We are very much interested in publishing the following paper and would love to get feedback from interested individuals to help propel the paper forward.

Click here to download the paper: Zargarpour & Wanzer (2017). From college access to success. AERA Paper

Developmental Appropriateness as Cultural Competence in Evaluation

Children and adults differ more than simply age; rather, they differ in culture as well.1 This recognition can be hard for evaluators: as we have all passed through childhood, it is easy to believe we have the same or greater knowledge of children’s culture than they do. Furthermore, our “spatial proximity to children may lead us to believe that we are closer to them than we really are—only different in that (adults claim) children are still growing up (‘developing’) and are often wrong (‘lack understanding’).”2 Continue reading “Developmental Appropriateness as Cultural Competence in Evaluation”

How Can Evaluation Avoid Lemons?

I recently stumbled across a blog post by Dr. Simine Vazire, an associate professor in psychology at UC Davis, which discussed an economics article by Akerlof, “The market for “lemons”: Quality uncertainty and the market mechanism.” Here’s what he wrote:

In this article, Akerlof employs the used car market to illustrate how a lack of transparency (which he calls “information asymmetry”) destroys markets.  when a seller knows a lot more about a product than buyers do, there is little incentive for the seller to sell good products, because she can pass off shoddy products as good ones, and buyers can’t tell the difference.  the buyer eventually figures out that he can’t tell the difference between good and bad products (“quality uncertainty”), but that the average product is shoddy (because the cars fall apart soon after they’re sold). therefore, buyers come to lose trust in the entire market, refuse to buy any products, and the market falls apart. (Vazire, 2017, “looking under the hood”)

This is much similar to the replication crisis in psychology, and I worry that evaluation may come to many of these same issues. This worry was also expressed by Scriven (2015).1 He writes, “Also depressing was the discovery that the great classic disciplines, although they thought they had a quality control system, in fact the procedure that everyone immediately put forward as performing that function–peer review–turned out to have been hardly ever studied for simple but essential virtues like reliability and validity…” (p. 18). Continue reading “How Can Evaluation Avoid Lemons?”

Past Its Expiration Date: When Longitudinal Research Loses Relevance [GUEST POST]

Jennifer Hamilton is a Dr. Who fan and an applied statistician who loves to talk about evaluation methodology. She is fun at parties. Dr. Hamilton has been working in the field of evaluation since 1996, and has conducted evaluations with federal government programs; state, county, and city agencies and organizations; and foundations and nonprofits.

You can email Jennifer Hamilton at

My company (the Rockville Institute) was hired to conduct a 5 year evaluation of a national program that supports schools in high-poverty neighborhoods improve the health of students and staff.  Schools monitor their progress using the Center for Disease Control and Prevention’s (CDC) School Health Index.

The program had recently developed an on-line model of support, to supplement their traditional on-site support model, but wasn’t sure if they should take it to scale. They wanted to base their decision on evaluation results. Therefore, we proposed a rigorous randomized study comparing these two different types of support.

The problem was, two years in, the program’s revenue was shrinking and they had to start using the on-line support program, because it was more cost effective. They could not wait for the results of the evaluation to make their decision. In short, the program did not need us anymore.

We knew their decision was made, but we hoped that the study results could still be useful to other programs. We needed to make some changes so that it would be relevant to a broader audience.  We had two groups – less and more intensive support. If we could expand this, by adding a no-support arm, and an even more intensive arm, the results could be relevant to all kinds of programs. So we developed a continuum of support intensity, from no support (new arm), low support (on-line model), moderate support (on-site support) and a new high intensity model of on-site support (new arm).

But where were we going to find these extra schools?   

We knew that schools implementing the program were only a small portion of the universe of schools completing the CDC instrument. The CDC could therefore provide outcome data for matched schools not participating in the program.


We also knew that another study of the program was being conducted and was using the same outcome as us. The support was provided in person by program manages with lower caseloads and more time onsite than the moderate support group(M2) from the original design. The other research group could provide outcome data for matched schools receiving a more intensive version of support

But What About the Methodology?

The question is how to add these new groups while retaining the rigor of the original design. While our original schools were randomized into groups, the new schools can only be matched to the randomized pairs. So we are mixing a quasi-experimental design (QED) into an RCT. What does this mean, practically speaking? Well, we have to think about all the possible comparisons.

The original L1/M2 comparison is unchanged and maintains the highest level of internal reliability – because both groups of schools were randomly assigned.  All of the other possible contrasts are still internally reliable, although to a slightly lesser the extent – because they now involved matched schools instead of randomly assigned schools. 

Implications for Evaluators

This study illustrates a common danger of longitudinal designs - they just takes too long in the policy world and programs are typically in flux. But the funder supported efforts to expand the focus beyond the specific program to one that would have broader applicability. This resulted in a hybrid design that still maintained sufficient rigor to respond to broad policy questions. Flexibility in the evaluation can still save an RCT, and this mixed QED-RCT design can help!

What is the developmental context? and why is it important to evaluation?

In a paper presented at the 2009 annual meeting of the American Evaluation Association1, Tiffany Berry, Susan Menkes, and Katherine Bono discussed how evaluators could improve their practice through the developmental context. They argue that evaluators have spent years discussing how the program context (e.g., age of program, accessibility, size of program, timeline, political nature) and evaluation context (e.g., stakeholder involvement, method proclivity, measurement tools, purpose, use of results) affect the practice of evaluation. However, there has been little discussion on how the participants of a program, and particularly the age of participants, also affect the practice of evaluation.2 Thus, they describe what they call the Developmental Context and the three core developmental facets that define the development context. Continue reading “What is the developmental context? and why is it important to evaluation?”

Developmentally Appropriate Evaluations

In evaluation, one thing is clear: context matters. Many evaluators have described how the context of the program (e.g., age of program, type of program, feasibility) and the context of the evaluation (e.g., resources, stakeholder involvement, measurement tools) affect evaluation designs, methods, practices, and measures. However, evaluators have only begun to examine how the developmental context also affect how evaluators design and conduct evaluations. Specifically, how should the age of participants affect evaluations?  Continue reading “Developmentally Appropriate Evaluations”

Importance of Measuring Participants’ Reasons for Being in the Program

This blog post was originally posted on AEA365 and was written with Tiffany Berry, a research associate professor at Claremont Graduate University. 

Today we are going to discuss why you should measure participants’ motivation for joining or continuing to attend a program.

Sometimes, randomization in our impact evaluations is not possible. When this happens, there are issues of self-selection bias that can complicate interpretations of results. To help identify and reduce these biases, we have begun to measure why youth initially join programs and why they continue participating. The reason participants’ join a program is a simple yet powerful indicator that can partially account for self-selection biases while also explaining differences in student outcomes. Continue reading “Importance of Measuring Participants’ Reasons for Being in the Program”

Embedding Continuous Quality Improvement Throughout Organizations

This blog post was originally posted on AEA365 and was written with Tiffany Berry, a research associate professor at Claremont Graduate University. 

Today we are going to discuss the importance of embedding quality throughout an organization by discussing our work in promoting continuous quality improvement (CQI) in afterschool programs.

CQI systems involve iterative and ongoing cycles of goal setting about offering quality programming, using effective training practices to support staff learning and development, frequent program monitoring including site observations and follow-up coaching for staff, and analyzing data to identify strengths and address weaknesses in program implementation. While CQI within an organization is challenging, we have begun to engage staff in conversations about CQI. Continue reading “Embedding Continuous Quality Improvement Throughout Organizations”