Why aren’t evaluators adapting their evaluations to the developmental context?

Overall, my study found that evaluators are less likely to be participatory—both in the overall evaluation process and in data collection methods—when the program beneficiaries are children than when they are adults. Why is this the case?

One possibility is that the evaluators in my study were not well-versed in working with youth. However, half of the evaluators were in the Youth Focused Evaluation TIG or the PreK-12 Educational Evaluation TIG, indicating they had some experience working with youth programs. Membership in these TIGs and self-reported developmental knowledge did not really relate to their evaluation practices.

Another possibility is that some other evaluation characteristic, such as their education level, their evaluation role (i.e., internal or external), or years of experience as an evaluator, could relate to their developmentally appropriate practice. Again, there were few differences between these evaluation characteristics in their evaluation practices.

Thus, the questions remain: which evaluators are more likely to have developmentally appropriate practice and what are the barriers to developmentally appropriate practice?

Some previous research suggests that experienced evaluators, even those experienced in working with youth, may need help in conducting developmentally appropriate evaluations. In a content analysis of youth program evaluations, Silvana Bialosiewicz (2013) found that few evaluations reported developmentally appropriate practices. That study was the impetus for the current study. A follow-up study involved interviewing youth evaluators and found many barriers to high quality youth program evaluation practice (Bialosiewicz, 2015). These barriers included cost and time needed and misconceptions from clients about good evaluation practice. Overall, this suggests that evaluators may need more training in developmentally appropriate practice or better resources for conducting developmentally appropriate youth program evaluations.

Next Steps

As with most research, I’m left with many more questions about developmentally appropriate evaluations than I was able to answer. I believe the results of the study suggest more need in examining youth participatory evaluation. However, I’m particularly interested in survey techniques with children and adolescents. I often see misunderstanding about survey methodology in general, and this is exacerbated when surveying children and adolescents. I am hoping to present at AEA 2017 on best practices in surveying children to help remedy this issue, but I also would like to further study this topic.

How evaluators adapt their evaluations to the developmental context: Evaluation methods

Knowledge about children is best obtained directly from youth using interviews, focus groups and surveys. This is in stark contrast to past commonly used methods of observations and ethnography, which were primarily used because researchers did not believe youth could provide reliable and valid data.[1]

In my study, I examined whether evaluators collected data about beneficiaries directly (i.e., interviews, focus groups, surveys) or indirectly (i.e., case studies, observations, archival data). If evaluators indicated they would collect data directly from participants, I also asked them questions about their survey or interview-specific practices.

Overall, evaluators were more likely to indirectly collect data from beneficiaries when they were children and adolescents than when they were adults. For the tutees, evaluators were less likely to survey children or conduct focus groups with children and more likely to conduct observations. Interestingly, evaluators in the child condition were also more likely to survey and conduct focus groups with tutors, as well as collect archival data (as a reminder, the tutors in this condition are adolescents).

The following are some of the interesting differences (or lack-thereof) in the survey and interview-specific methodologies. Evaluators in the child condition were…

  • more likely to have program staff administer the survey and use oral administration and less likely to use online administration.
  • less likely to have the evaluation team conduct interviews and more likely to use interview specialists.
  • more likely to have shorter interviews, fewer participants in focus groups, and focus groups comprised of participants of similar ages.
  • equally likely to use 2-4 (36%), 5-7 (63%), 8-10 (2%), or 11+ (0%) response options in the survey.[2]
  • equally likely to test for internal consistency (62%), test-retest reliability (42%), face validity (70%), criterion validity (32%), construct validity (35%), use factor analysis techniques (52%), or test for moderators (35%).
  • equally likely to use unstructured (0%), semi-structured (92%), or structured (8%) interviews.[3]

[1] Punch, S. (2002). Research with children: The same or different from research with adults? Childhood, 9(3), 321–341.

[2] There were likely no differences due to a floor effect of response options typically used. The response options could be examined in a future study be examining each number between 2-8 individually instead of clustered into categories to avoid this floor effect.

[3] Evaluators overwhelmingly preferred semi-structured interviews regardless of the age of participants.

How evaluators adapt their evaluations to the developmental context: Evaluation design

What evaluation design is best? This debate has raged through the field of evaluation on what constitutes credible evidence[1] with some arguing for RCTs as the “gold standard” and others questioning the superiority of the RCT.

This debate is somewhat meaningless when we understand that the evaluation design is chosen based on the evaluation questions. Evaluations seeking outcomes or impact are perhaps best served by an experimental (i.e., RCT) or quasi-experimental design whereas evaluations seeking the needs of the program and fidelity of implementation are better served by a descriptive (e.g., case study, observational) or correlation (e.g., cohort study, cross-sectional study) design.

In the context of youth programs, however, longitudinal designs may be particularly important. Longitudinal designs are critical for measuring and understanding development over time. They’re especially critical when knowledge of long-term effects, that may not manifest until the end of the program or after services have ended, is needed.

In my study, evaluators did not change their evaluation designs based on the age of participants. I asked evaluators to rank their choice of evaluation design and majority chose quasi-experimental (37%), descriptive/correlation (23%), or experimental (15%) as their primary choice. Few evaluators chose a case study (8%) or ethnographic (4%) design. A further 13% evaluators chose to write in another design, with majority indicating a mixed methods design.

I also asked evaluators how many waves of survey or interview data collection they would do across the three years of the evaluation. For those who responded to survey questions, 69% said they would do a baseline and multiple follow-up surveys, 28% said they would do a baseline and one follow-up, and only 3% said they would only do a baseline or post-test survey. For those who responded to interview questions, 93% said they would do multiple sets of interviews or focus groups and only 7% said they would do only one set. However, there are likely no differences because of the length of the simulation study’s evaluation of three years.

Be sure to check out the previous post on how the evaluation approach differed across age conditions. Stay tuned for more results from my study in terms of the evaluation methods, as well as a discussion explaining these results and next steps!

[1]  Donaldson, S. I., Christie, C. A., & Mark, M. M. (Eds.) (2009). What counts as credible evidence in applied research and evaluation practice? Los Angeles, CA: Sage.

How evaluators adapt their evaluations to the developmental context: Evaluation approach

As mentioned previously, developmentally appropriate evaluation requires a culturally appropriate evaluation in the context of youth programs. This means including youth, or at minimum those with knowledge and experience working with youth, in the evaluation.

In my study, I asked practicing evaluators to describe the levels of involvement there would be across a wide range of stakeholders including school administrators, teachers, parents, program staff, program designers, district personnel, funders, developmental consultants, math consultants, and the tutors and tutees of the program. In particular, I was interested in the levels of involvement of the consultants, the tutors, and the tutees across the evaluators randomly assigned to the child, adolescent, or adult conditions.

Overall, evaluators were less likely to involve tutees in the child condition than the adolescent condition, and evaluators in both conditions were less likely to involve tutees than evaluators in the adult condition. Evaluators were also less likely to involve tutors in the child condition (as a reminder, the tutors in this condition are adolescents) than evaluators in the adult condition. There were no differences in use of consultants across the conditions.

One could argue that some evaluators have the knowledge and expertise required to conduct a culturally appropriate youth program evaluation. Thus, I also examined the extent to which their knowledge and expertise moderated the differences. Evaluators in the Youth Focused Evaluation TIG (a TIG focused on youth participatory evaluation) were most likely to involve beneficiaries than non-youth evaluation TIG members and members in the PreK-12 TIG were least likely to involve beneficiaries. Furthermore, evaluators with more self-reported developmental expertise were less likely to involve beneficiaries.

These results suggest that evaluators are less likely to involve beneficiaries of the program when they are children and adolescents than when they are adults. Evaluators were exposed to the same exact program, with the only difference being the age of beneficiaries.

Stay tuned for more results from my study in terms of the evaluation design and evaluation methods, as well as a discussion explaining these results and next steps!

Comments Requested: College Access Journal Publication

Together with Dr. Nazanin Zargarpour, we were accepted to present the attached paper for the American Education Research Association’s (AERA) 2017 conference. We are very much interested in publishing the following paper and would love to get feedback from interested individuals to help propel the paper forward.

Click here to download the paper: Zargarpour & Wanzer (2017). From college access to success. AERA Paper

Developmental Appropriateness as Cultural Competence in Evaluation

Children and adults differ more than simply age; rather, they differ in culture as well.1 This recognition can be hard for evaluators: as we have all passed through childhood, it is easy to believe we have the same or greater knowledge of children’s culture than they do. Furthermore, our “spatial proximity to children may lead us to believe that we are closer to them than we really are—only different in that (adults claim) children are still growing up (‘developing’) and are often wrong (‘lack understanding’).”2

This points to a need for cultural competence, which the American Evaluation Association (AEA) describes as “critical for the profession and for the greater good of society.”3 Cultural competence practice in evaluation includes:

  • Acknowledging the complexity of cultural identity
  • Recognizing the dynamics of status and power (e.g., the differential power between adults and children)
  • Recognizing and eliminating bias in language
  • Employing culturally (i.e., developmentally) appropriate methods

In particular, culturally competent evaluations require inclusion of cultural expertise on the evaluation team.4 In the case of youth programs, this means inclusion of developmental expertise, which can involve developmental experts (i.e., psychologists, developmental scientists) but evaluators should also strive to include the youth themselves.

A youth participatory approach can reduce the harmful power imbalances between adult evaluators and youth participants,5, are more ethical for youth6, and offer many benefits for children and adolescents, including knowledge about the evaluation process and improvements in self-esteem, decision-making, and problem-solving skills.7

However, a youth participatory approach can vary by a range of levels.8 At the lowest level, participants are simply included as a data source, which can yet more vary by direct (i.e., surveys, interviews) and indirect (i.e., observations, archival data) data collection. Further up the participatory ladder is giving youth input on the evaluation process. The highest level of youth participation is youth actually leading the evaluation, much like they would in a traditional empowerment evaluation.

Inclusion of youth, or at least adult developmental experts, can improve the likelihood of a culturally competent evaluation for the first two bullet points mentioned above. However, evaluators still must make sure the evaluation design and methods are culturally, and therefore developmentally, appropriate. The next post will discuss how evaluators can promote cultural competence across the evaluation process in the context of youth programs.

How Can Evaluation Avoid Lemons?

I recently stumbled across a blog post by Dr. Simine Vazire, an associate professor in psychology at UC Davis, which discussed an economics article by Akerlof, “The market for “lemons”: Quality uncertainty and the market mechanism.” Here’s what he wrote:

In this article, Akerlof employs the used car market to illustrate how a lack of transparency (which he calls “information asymmetry”) destroys markets.  when a seller knows a lot more about a product than buyers do, there is little incentive for the seller to sell good products, because she can pass off shoddy products as good ones, and buyers can’t tell the difference.  the buyer eventually figures out that he can’t tell the difference between good and bad products (“quality uncertainty”), but that the average product is shoddy (because the cars fall apart soon after they’re sold). therefore, buyers come to lose trust in the entire market, refuse to buy any products, and the market falls apart. (Vazire, 2017, “looking under the hood”)

This is much similar to the replication crisis in psychology, and I worry that evaluation may come to many of these same issues. This worry was also expressed by Scriven (2015).1 He writes, “Also depressing was the discovery that the great classic disciplines, although they thought they had a quality control system, in fact the procedure that everyone immediately put forward as performing that function–peer review–turned out to have been hardly ever studied for simple but essential virtues like reliability and validity…” (p. 18).

What peer review system does evaluation have? Scriven put forth meta-evaluation, but the practice is rarely, if ever, done. Scriven says:

This is because our real world practice is largely in the role of consultant, and consultants’ work does not normally undergo peer review. We need to tighten up the trashy way peer review is done in other disciplines and use serious meta-evaluation to fill the gap in our own emerging discipline with respect to that job that we say (and can prove) taht peer review ought to be done in the other disciplines. (p. 19)

Scriven goes on to argue that evaluation has a duty to study how evaluation is conducted in other disciplines, leading evaluation to be the “alpha discipline.” But before we can consider evaluation as the alpha discipline, we have to do a “full analysis of the pragmatics” of evaluation, meaning we need to more clearly define evaluation so that evaluation is considered a key methodology of social science

“that must be mastered in order to do all applied (and some pure) social science. In that way, good evaluation research designs will be the exemplar for much of social science, instead of social science treating personnel or program evaluation as something they can do with their current resources, albeit conceding that there are some specialists in these sub-areas.” (p. 20)

Scriven’s solution of serious meta-evaluation done publicly aligns with the solution promoted by Akerlof: transparency. I further argue that there need to be more serious regulations applied to ensure this transparency, and one way to do this is through professionalization. Unfortunately, professionalization within the American Evaluation Association has met with serious resistance, but work by our colleagues up north (the Canadian Evaluation Society) and work on standards and competencies within AEA are steps forward. I think professionalization will help evaluators more clearly define who evaluators are and what evaluation is so that the field can move forward as the alpha discipline Scriven describes.

 

Past Its Expiration Date: When Longitudinal Research Loses Relevance [GUEST POST]

Jennifer Hamilton is a Dr. Who fan and an applied statistician who loves to talk about evaluation methodology. She is fun at parties. Dr. Hamilton has been working in the field of evaluation since 1996, and has conducted evaluations with federal government programs; state, county, and city agencies and organizations; and foundations and nonprofits.

You can email Jennifer Hamilton at jenniferannehamilton@yahoo.com

My company (the Rockville Institute) was hired to conduct a 5 year evaluation of a national program that supports schools in high-poverty neighborhoods improve the health of students and staff.  Schools monitor their progress using the Center for Disease Control and Prevention’s (CDC) School Health Index.

The program had recently developed an on-line model of support, to supplement their traditional on-site support model, but wasn’t sure if they should take it to scale. They wanted to base their decision on evaluation results. Therefore, we proposed a rigorous randomized study comparing these two different types of support.

The problem was, two years in, the program’s revenue was shrinking and they had to start using the on-line support program, because it was more cost effective. They could not wait for the results of the evaluation to make their decision. In short, the program did not need us anymore.

We knew their decision was made, but we hoped that the study results could still be useful to other programs. We needed to make some changes so that it would be relevant to a broader audience.  We had two groups – less and more intensive support. If we could expand this, by adding a no-support arm, and an even more intensive arm, the results could be relevant to all kinds of programs. So we developed a continuum of support intensity, from no support (new arm), low support (on-line model), moderate support (on-site support) and a new high intensity model of on-site support (new arm).

But where were we going to find these extra schools?   

We knew that schools implementing the program were only a small portion of the universe of schools completing the CDC instrument. The CDC could therefore provide outcome data for matched schools not participating in the program.

 

We also knew that another study of the program was being conducted and was using the same outcome as us. The support was provided in person by program manages with lower caseloads and more time onsite than the moderate support group(M2) from the original design. The other research group could provide outcome data for matched schools receiving a more intensive version of support

But What About the Methodology?

The question is how to add these new groups while retaining the rigor of the original design. While our original schools were randomized into groups, the new schools can only be matched to the randomized pairs. So we are mixing a quasi-experimental design (QED) into an RCT. What does this mean, practically speaking? Well, we have to think about all the possible comparisons.

The original L1/M2 comparison is unchanged and maintains the highest level of internal reliability – because both groups of schools were randomly assigned.  All of the other possible contrasts are still internally reliable, although to a slightly lesser the extent – because they now involved matched schools instead of randomly assigned schools. 

Implications for Evaluators

This study illustrates a common danger of longitudinal designs - they just takes too long in the policy world and programs are typically in flux. But the funder supported efforts to expand the focus beyond the specific program to one that would have broader applicability. This resulted in a hybrid design that still maintained sufficient rigor to respond to broad policy questions. Flexibility in the evaluation can still save an RCT, and this mixed QED-RCT design can help!

What is the developmental context? and why is it important to evaluation?

In a paper presented at the 2009 annual meeting of the American Evaluation Association1, Tiffany Berry, Susan Menkes, and Katherine Bono discussed how evaluators could improve their practice through the developmental context. They argue that evaluators have spent years discussing how the program context (e.g., age of program, accessibility, size of program, timeline, political nature) and evaluation context (e.g., stakeholder involvement, method proclivity, measurement tools, purpose, use of results) affect the practice of evaluation. However, there has been little discussion on how the participants of a program, and particularly the age of participants, also affect the practice of evaluation.2 Thus, they describe what they call the Developmental Context and the three core developmental facets that define the development context.

1. Principles of Development

The first component of the developmental context involves knowledge of principles and theories of development. These principles and theories explain how the environment, the individual, and the interaction between the environment and individual explain development over time. There have been many broadly accepted theories of such development. Two of my personal favorite, and two that emphasize the interaction between the environment and individual, include Bronfenbrenner’s bioecological systems theory (Bronfenbrenner & Morris, 2006) and Lerner’s (2006) developmental systems theory.

Relevance of Principles of Development to Evaluation

People are a product of their individual attributes and the contexts they live in over time. Thus, when conducting evaluations, it is important to examine participant, program, and other contextual characteristics in tandem. A systems perspective to evaluation can be a useful endeavor to achieve this. This type of approach also helps in answering the “For whom does this program work?” question in evaluation.

2. Developmental Domains

Developmental domains refer to cognitive, socioemotional, physical, and other domains of development. For instance, the cognitive domain refers to intellectual or mental development, such as thinking, memory, reasoning, problem-solving, language, and perception. The socioemotional domain refers to relationship skills, social awareness, self-management, self-awareness, and responsible decision-making3. The physical domain refers to development of body structure, including sensory/motor development and coordination of perception and movement.

Relevance of Developmental Domains to Evaluation

Developmental domains primarily seem to affect the appropriate methods for participants. For example, knowledge of the cognitive stage of participants can help evaluators accommodate the reading level ability for construction of a paper survey. Knowledge of the socioemotional stage of participants can determine whether focus groups or interviews would be better suited for participants. Also, knowledge of the physical stage of participants can determine whether computer surveys, which require the use of fine motor skills for using a mouse and keyboard, are appropriate.

3. Age of Participants

The age of participants is perhaps most salient to evaluators. We typically group young participants into a variety of categories (e.g., infants, toddlers, young children, older children, adolescents, teenagers, young adults, youth) but these categories often overlap and are not clearly defined in the literature. For example, are youth comprised of children as well as adolescents?

Relevance of Age of Participants to Evaluation

While age is typically used as a determinant of whether a data collection method is developmentally appropriate, the issue becomes complicated when considering children and adolescents from diverse populations (e.g., low-income, cultural and ethnic minorities, those with mental, emotional, or physical challenges).4 Disadvantaged youth may not be at the same developmental stages as their more advantaged counterparts.5 While age is a simple factor to consider when designing and conducting evaluations, consideration of age alone may not be sufficient to ensure a developmentally appropriate evaluation.

Developmentally Appropriate Evaluations

In evaluation, one thing is clear: context matters. Many evaluators have described how the context of the program (e.g., age of program, type of program, feasibility) and the context of the evaluation (e.g., resources, stakeholder involvement, measurement tools) affect evaluation designs, methods, practices, and measures. However, evaluators have only begun to examine how the developmental context also affect how evaluators design and conduct evaluations. Specifically, how should the age of participants affect evaluations?  Continue reading “Developmentally Appropriate Evaluations”