Research on Evaluation: It Takes a Village (The Solutions)

Our first post lamented the poor response rates in research on evaluation. There are many reasons for these poor response rates, but there are also many things that we can do to improve response rates and subsequently improve the state of research on evaluation.

How can evaluators improve response rates?

Coryn et. al (2016) suggests that evaluators find research on evaluation important. However, the response rates to these projects would suggest otherwise. As with any area of opportunity, there is often several components that influence success. Yes, evaluators should naturally care more about propelling our field forward, but the ability to change that without amending our practices as researchers seems unlikely. Therefore, we believe that the importance of participation must be built and to do we need to focus on what evaluators see as valuable research. Researchers must also take care to carry-out research with sound methodologies. Some recommendations for improving response rates as evaluators include:

  1. Conducting research that is relevant to the field of evaluation while maintaining a high standard of rigor. You can increase the likelihood of this by...
    1. Piloting your study (grad students and colleagues are great for this!)
    2. Asking for feedback from a critical friend
    3. Having evaluation practice guide or inform the research questions
  2. Reduce the cognitive load on participants by making our surveys shorter and easier to complete. You can do this by tying your questions to your research questions. It’s fun to have lots of data but it is even better to have meaningful data (i.e. stop asking unnecessary questions).
  3. Apply Dillman’s Tailored Design method. This includes things like:
    1. Increasing the benefits of participation, such as by asking for help from participants or providing incentives for participation
    2. Decreasing the costs of participation, such as by ensuring no requests are personal or sensitive in nature and that it is convenient for participants to respond

What can the AEA Research Request Task Force do?

The AEA Research Request Task Force is also a crucial component of this process, acting not only as a gatekeeper to the listserv, but also as quality and relevance control. Currently, samples of usually 1,000-2,000 evaluators are sent out for every research request. If we could increase the response rate, we could decrease our random sample and decrease the load on the AEA membership. Some recommendations for new policies for the task force include:

  1. Policies that would satisfy Dillman’s Tailored Design Method, including allowing:
    1. Personalized contact (e.g., providing names to researchers)
    2. Repeated contact to participants
    3. Contact via postal or telephone
  2. Consider sending out survey requests themselves to improve the legitimacy of survey requests and reduce confidentiality concerns
  3. Have more stringent rigor and relevancy standards to decrease the likelihood that participating evaluators get frustrated over the surveys that sent out and subsequently opt out of future research

Conclusions

We believe that evaluators should care more about the importance of research on evaluation and that it should be more visible in the field so that practitioners know about it and how it can improve their practices. However, it is our responsibility to improve our field by being good research participants. So please, if you ever receive a request to participate in a research on evaluation study, please do so. You are helping our field of evaluation.

This post was written in collaboration with Deven Wisner. By day, Deven manages human capital and business analytics for Global Registration Services, Inc., a legal services company in Tucson, AZ. By night, Deven is a consultant focused on data- and research-based decision making. He is a proud member of the American Evaluation Association and Board Member of the Arizona Evaluation Network.

Research on Evaluation: It Takes a Village (The Problem)

Response rates from evaluators are poor. Despite research suggesting that AEA members consider research on evaluation as important, response rates for research on evaluation studies are often only between 10-30%.1

As evaluators ourselves, we understand how busy we can be. However, we believe that evaluators should spend more time contributing to these studies. These studies can be thought of as evaluations of our field, such as: what our current practices are, how should we train evaluators, what can we improve, how do our evaluations lead to social betterment, and more are just some of the broad questions these studies aim to answer. These studies can also help inform AEA efforts on the evaluation guiding principles and evaluator competencies.

Why are we seeing poor response rates?

  1. Response rates in general are poor. Across the world, response rates are declining. We are not unique in this regard. This phenomenon is happening in telephone, mailing, and internet surveys alike.
  2. Poorly constructed surveys. Unfortunately, some of this issue is probably within researchers themselves. They develop surveys that are too long or too confusing so evaluators drop out early from the study. For instance, Dana’s thesis had a 27% response rate but only 59% of participating evaluators finished the entire survey, which took participants a median 27 minutes to complete. To improve response and completion rates, a more succinct survey would have worked better.
  3. Evaluation anxiety. We often think about evaluation anxiety in our clients, but these research on evaluation studies flip the focus to ourselves. It may be anxiety-provoking for evaluators to introspect—or let other evaluators inspect—their own practices. As an example, participants in Deven’s research on UFE were asked to describe their approach to evaluation after selecting which “known” approaches they apply. Some participants explained that they did not know the formal name for their approach, or they just chose the one that sounded right. This could have been anxiety-provoking for participants and reduced their likelihood of participating or completion the study.
  4. Apathy. Perhaps evaluators just do not care about research on evaluation. Many evaluators “fall into” evaluation rather than joining the field intentionally. They may not have the research background to care enough about “research karma.”
  5. Inabilities to truly use Dillman’s principles. If you know anything about survey design, you know about the survey guru Don Dillman and his Tailored Design Method for survey development. Some of the methods they recommend for increasing response rates are to personalize surveys (e.g., use first and last names), use multiple forms of communication (e.g., send out a postcard as well as an email with the survey), and repeated contact (e.g., an introductory email, the main survey email, and multiple follow-ups). However, these methods are unable to be used with AEA members. The research request task force does not provide names or mailing addresses to those who request a sample of evaluators and they limit contact to members to no more than 3 notifications over no more than a 30 day period. This makes the tailored design method difficult to implement.

Our next post will discuss what can be done by evaluators and the AEA research task force to improve response rates.

This post was written in collaboration with Deven Wisner. By day, Deven manages human capital and business analytics for Global Registration Services, Inc., a legal services company in Tucson, AZ. By night, Deven is a consultant focused on data- and research-based decision making. He is a proud member of the American Evaluation Association and Board Member of the Arizona Evaluation Network.

Visualizing Statistical Significance – and Effect Sizes!

Ann Emery recently posted an awesome blog post on visualizing statistical significance. Starting with a table of statistics with lots of numbers and asterisks (*), she ended up with this lovely version:

Here’s what our final makeover looked like. We decided to focus on the big-picture findings. So, we used empty squares to represent variables that weren’t statistically significant and filled-in squares to represent variables that were. We used p?.05 as our cutoff here; anything with .05 or lower got filled in and anything above .05 remained empty.
Ann K. Emery: http://annkemery.com/statistically-significant-p-values/

I responded saying I loved it, but I’m often asked to also show effect sizes and mentioned a couple of ideas I had for how she could also show effect sizes visually. Bogdan Miku explored this with confidence intervals and effect sizes in another blog post, but I wanted to share how I thought about doing it using a table I often use with one of my clients.

Original version, revised a la Emery

With these clients, we often want to answer the questions, “For whom?” or “In what cases?” In this case, it was exploring for whom and in what cases is after-school program quality rated higher. We explored this by school type (public and charter), gender (female and male), grade level, and their reasons for joining the program, an important variable we often explore in our evaluations. 

Often, my PI asks me, “But which of these are meaningful differences?” By asking this question, she is asking the effect sizes of these differences. I often would then add a narrative explaining these differences, but we all know that our clients don’t necessarily read the fine print in our reports! So with that, here is my version using Emery’s method shown above. Note that I had to pull the text explaining where the differences were up above as part of the table header. This also helps me remove the table note that I’d always add explaining what P, C, F, M, Int, and Ext meant.

Visualizing Effect Sizes

My first step in visualizing the effect sizes was figuring out the exact effect sizes. Given these are group differences of either 2 or 4 groups, I decided to use eta-squared (you can find an awesome Excel spreadsheet to calculate effect sizes by Daniel Lakens here).  This could easily be transformed into Cohen’s d if I preferred, but for the purpose of this exercise I just left it at eta-squared.

My first attempt, and what I originally suggested to Ann, was to simply vary the sizes of the boxes:

Ultimately, I did not like this. It was hard to see the differences well and I could only imagine how trying to explain the sizes in a table note would go. So I scrapped that and ultimately ended up with the following:

Rather than the size of the boxes, I varied the number of boxes. This is much easier to see visually. You can clearly tell that the thing that explained ratings of quality was students’ reason for joining, but there’s also a nice effect of school type on quality ratings. Gender and grade level, while statistically significant, are not very meaningfully different.

Reflections

Personally, I’m really stoked that I finally got around to doing this. I’m sad that I didn’t do this for the final report that I finished a couple of months ago, but I look forward to using it in subsequent reports. The only issue I have is that often things flip around in the tables and the second row (e.g., public > charter) doesn’t always work out for every row below. Sometimes public > charter but sometimes charter > public. I’m not entirely sure how I’d reconcile that. Perhaps I could use a different color to show that the relationship is vice-versa. What are your thoughts?

Confessions of a QUANT

I have a confession to make. I am a QUANT.

By a QUANT, I mean that I am good at quantitative methods and, because I’m good at them, I tend to gravitate to them. I simply prefer doing quantitative studies. I conduct mostly mixed methods studies, but I tend to leave the qualitative work to the experts. I don’t think I would say I’m post-positivist and like to say I’m pragmatist (I am constantly reflecting on whether this is true for me), but it is clear to me that I lean more towards quantitative methods.

I confess to this because being a QUANT used to make me feel guilty. I have always understood and respected the power of qualitative work, but because I am much better at quantitative work I have never really worked at improving my skills in qualitative work. I would go so far to say that I am a very poor qualitative evaluator. It’s a lot easier to hire a qualitative expert than try to do it myself.

I am not sure being a QUANT is entirely my fault. My graduate school requires a full year of statistics our first year (intermediate stats, ANOVA, regression, and categorical data analysis) and has offered numerous extra statistics courses beyond that which I ate up and even TA’d for (e.g., multivariate statistics, factor analysis, SEM, MLM, IRT). On the other hand, they only have one qualitative class (which has only been offered three times during my time at the school and I never had the opportunity to take it) and a new mixed methods class (which I did take, but the class did not teach qualitative methods).

Despite being a QUANT, I am going to try to improve my qualitative skills while I am still in graduate school. My dissertation is going to be a sequential explanatory mixed methods design, with a whopping forty interviews at the second stage. I am leading a project that is heavily qualitative and has required me to learn better coding strategies and how to calculate interrater reliability. I’m doing more of the qualitative work in my evaluations rather than leaving it to the experts.

It’s pained me to be so unknowledgeable about a topic that I should know more about, which is why I am committing to read Michael Quinn Patton’s book on Qualitative Research and Evaluation Methods (4th Ed) while my fellow students take the qualitative class in school. I’m two chapters in and we’re having great discussions on it. Stay tuned for a post in a month or two for my reflections on this endeavor.

Does This Logic Model Make My Program Look Good?

Over the past several years, data visualization has taken the evaluation community by storm. Today, there are dozens of blogs and online resources to help evaluators hop on the #dataviz train and communicate findings more effectively. The start of a new year is the perfect time to adopt new data visualization trends and apply them to your practice. However, before you jump on the bandwagon, it is worth testing assumptions about what works and what does not. That’s why we at the Claremont Evaluation Center decided to study the effectiveness of data visualization principles applied to logic models.

Read More

Dana presents at Eval17: Surveying children, using vignettes to train staff, and more!

I am really looking forward to meeting you all at the annual AEA conference, Eval17! I wanted to share with you the details of my various presentations and hope you can make it to any of the ones that pique your interest! Continue reading “Dana presents at Eval17: Surveying children, using vignettes to train staff, and more!”

Can evaluators be the bridge in the research-practice gap?

Researchers and practitioners agree that there is a gap between research (or theory) and practice. While the reasons for this gap are plentiful, they boil down to researchers and practitioners comprising two communities (Caplan, 1979) such that have different languages, values, reward systems, and priorities. The two communities try to bridge the gap through a variety of methods including producer-push models (e.g., knowledge transfer, knowledge translation, dissemination, applied research, interdisciplinary scholarship), user-pull models (e.g., evidence-based practice, practitioner inquiry, action research), and exchange models (e.g., research-practice partnerships and collaboratives, knowledge brokers, intermediaries). However, these methods typically focus on researchers or practitioners and do not consider other scholars that could fill this role. Continue reading “Can evaluators be the bridge in the research-practice gap?”

Evaluation is Not Applied Research

What is the difference between evaluation and research, especially applied research? For some, they are one and the same. Evaluation and research use the same methods, write the same types of reports, and come to the same conclusions. Evaluation is often described as applied research. For instance, here are some recent quotes describing what evaluation is: “Evaluation is applied research that aims to assess the worth of a service.” (Barker, Pistrang, & Elliott, 2016). “Program evaluation is applied research that asks practical questions and is performed in real-life situations.” (Hackbarth & Gall, 2005), and the current editor of the American Journal of Evaluation saying that “evaluation is applied research.” (Rallis, 2014). This is confusing for introductory evaluation students, particularly those coming from a research background or studying evaluation at a research institution. Continue reading “Evaluation is Not Applied Research”

Why aren’t evaluators adapting their evaluations to the developmental context?

Overall, my study found that evaluators are less likely to be participatory—both in the overall evaluation process and in data collection methods—when the program beneficiaries are children than when they are adults. Why is this the case? Continue reading “Why aren’t evaluators adapting their evaluations to the developmental context?”