Research on Evaluation: It Takes a Village (The Solutions)

Our first post lamented the poor response rates in research on evaluation. There are many reasons for these poor response rates, but there are also many things that we can do to improve response rates and subsequently improve the state of research on evaluation.

How can evaluators improve response rates?

Coryn et. al (2016) suggests that evaluators find research on evaluation important. However, the response rates to these projects would suggest otherwise. As with any area of opportunity, there is often several components that influence success. Yes, evaluators should naturally care more about propelling our field forward, but the ability to change that without amending our practices as researchers seems unlikely. Therefore, we believe that the importance of participation must be built and to do we need to focus on what evaluators see as valuable research. Researchers must also take care to carry-out research with sound methodologies. Some recommendations for improving response rates as evaluators include:

  1. Conducting research that is relevant to the field of evaluation while maintaining a high standard of rigor. You can increase the likelihood of this by...
    1. Piloting your study (grad students and colleagues are great for this!)
    2. Asking for feedback from a critical friend
    3. Having evaluation practice guide or inform the research questions
  2. Reduce the cognitive load on participants by making our surveys shorter and easier to complete. You can do this by tying your questions to your research questions. It’s fun to have lots of data but it is even better to have meaningful data (i.e. stop asking unnecessary questions).
  3. Apply Dillman’s Tailored Design method. This includes things like:
    1. Increasing the benefits of participation, such as by asking for help from participants or providing incentives for participation
    2. Decreasing the costs of participation, such as by ensuring no requests are personal or sensitive in nature and that it is convenient for participants to respond

What can the AEA Research Request Task Force do?

The AEA Research Request Task Force is also a crucial component of this process, acting not only as a gatekeeper to the listserv, but also as quality and relevance control. Currently, samples of usually 1,000-2,000 evaluators are sent out for every research request. If we could increase the response rate, we could decrease our random sample and decrease the load on the AEA membership. Some recommendations for new policies for the task force include:

  1. Policies that would satisfy Dillman’s Tailored Design Method, including allowing:
    1. Personalized contact (e.g., providing names to researchers)
    2. Repeated contact to participants
    3. Contact via postal or telephone
  2. Consider sending out survey requests themselves to improve the legitimacy of survey requests and reduce confidentiality concerns
  3. Have more stringent rigor and relevancy standards to decrease the likelihood that participating evaluators get frustrated over the surveys that sent out and subsequently opt out of future research


We believe that evaluators should care more about the importance of research on evaluation and that it should be more visible in the field so that practitioners know about it and how it can improve their practices. However, it is our responsibility to improve our field by being good research participants. So please, if you ever receive a request to participate in a research on evaluation study, please do so. You are helping our field of evaluation.

This post was written in collaboration with Deven Wisner. By day, Deven manages human capital and business analytics for Global Registration Services, Inc., a legal services company in Tucson, AZ. By night, Deven is a consultant focused on data- and research-based decision making. He is a proud member of the American Evaluation Association and Board Member of the Arizona Evaluation Network.

Research on Evaluation: It Takes a Village (The Problem)

Response rates from evaluators are poor. Despite research suggesting that AEA members consider research on evaluation as important, response rates for research on evaluation studies are often only between 10-30%.1

As evaluators ourselves, we understand how busy we can be. However, we believe that evaluators should spend more time contributing to these studies. These studies can be thought of as evaluations of our field, such as: what our current practices are, how should we train evaluators, what can we improve, how do our evaluations lead to social betterment, and more are just some of the broad questions these studies aim to answer. These studies can also help inform AEA efforts on the evaluation guiding principles and evaluator competencies.

Why are we seeing poor response rates?

  1. Response rates in general are poor. Across the world, response rates are declining. We are not unique in this regard. This phenomenon is happening in telephone, mailing, and internet surveys alike.
  2. Poorly constructed surveys. Unfortunately, some of this issue is probably within researchers themselves. They develop surveys that are too long or too confusing so evaluators drop out early from the study. For instance, Dana’s thesis had a 27% response rate but only 59% of participating evaluators finished the entire survey, which took participants a median 27 minutes to complete. To improve response and completion rates, a more succinct survey would have worked better.
  3. Evaluation anxiety. We often think about evaluation anxiety in our clients, but these research on evaluation studies flip the focus to ourselves. It may be anxiety-provoking for evaluators to introspect—or let other evaluators inspect—their own practices. As an example, participants in Deven’s research on UFE were asked to describe their approach to evaluation after selecting which “known” approaches they apply. Some participants explained that they did not know the formal name for their approach, or they just chose the one that sounded right. This could have been anxiety-provoking for participants and reduced their likelihood of participating or completion the study.
  4. Apathy. Perhaps evaluators just do not care about research on evaluation. Many evaluators “fall into” evaluation rather than joining the field intentionally. They may not have the research background to care enough about “research karma.”
  5. Inabilities to truly use Dillman’s principles. If you know anything about survey design, you know about the survey guru Don Dillman and his Tailored Design Method for survey development. Some of the methods they recommend for increasing response rates are to personalize surveys (e.g., use first and last names), use multiple forms of communication (e.g., send out a postcard as well as an email with the survey), and repeated contact (e.g., an introductory email, the main survey email, and multiple follow-ups). However, these methods are unable to be used with AEA members. The research request task force does not provide names or mailing addresses to those who request a sample of evaluators and they limit contact to members to no more than 3 notifications over no more than a 30 day period. This makes the tailored design method difficult to implement.

Our next post will discuss what can be done by evaluators and the AEA research task force to improve response rates.

This post was written in collaboration with Deven Wisner. By day, Deven manages human capital and business analytics for Global Registration Services, Inc., a legal services company in Tucson, AZ. By night, Deven is a consultant focused on data- and research-based decision making. He is a proud member of the American Evaluation Association and Board Member of the Arizona Evaluation Network.

Visualizing Statistical Significance – and Effect Sizes!

Ann Emery recently posted an awesome blog post on visualizing statistical significance. Starting with a table of statistics with lots of numbers and asterisks (*), she ended up with this lovely version:

Here’s what our final makeover looked like. We decided to focus on the big-picture findings. So, we used empty squares to represent variables that weren’t statistically significant and filled-in squares to represent variables that were. We used p?.05 as our cutoff here; anything with .05 or lower got filled in and anything above .05 remained empty.
Ann K. Emery:

I responded saying I loved it, but I’m often asked to also show effect sizes and mentioned a couple of ideas I had for how she could also show effect sizes visually. Bogdan Miku explored this with confidence intervals and effect sizes in another blog post, but I wanted to share how I thought about doing it using a table I often use with one of my clients.

Original version, revised a la Emery

With these clients, we often want to answer the questions, “For whom?” or “In what cases?” In this case, it was exploring for whom and in what cases is after-school program quality rated higher. We explored this by school type (public and charter), gender (female and male), grade level, and their reasons for joining the program, an important variable we often explore in our evaluations. 

Often, my PI asks me, “But which of these are meaningful differences?” By asking this question, she is asking the effect sizes of these differences. I often would then add a narrative explaining these differences, but we all know that our clients don’t necessarily read the fine print in our reports! So with that, here is my version using Emery’s method shown above. Note that I had to pull the text explaining where the differences were up above as part of the table header. This also helps me remove the table note that I’d always add explaining what P, C, F, M, Int, and Ext meant.

Visualizing Effect Sizes

My first step in visualizing the effect sizes was figuring out the exact effect sizes. Given these are group differences of either 2 or 4 groups, I decided to use eta-squared (you can find an awesome Excel spreadsheet to calculate effect sizes by Daniel Lakens here).  This could easily be transformed into Cohen’s d if I preferred, but for the purpose of this exercise I just left it at eta-squared.

My first attempt, and what I originally suggested to Ann, was to simply vary the sizes of the boxes:

Ultimately, I did not like this. It was hard to see the differences well and I could only imagine how trying to explain the sizes in a table note would go. So I scrapped that and ultimately ended up with the following:

Rather than the size of the boxes, I varied the number of boxes. This is much easier to see visually. You can clearly tell that the thing that explained ratings of quality was students’ reason for joining, but there’s also a nice effect of school type on quality ratings. Gender and grade level, while statistically significant, are not very meaningfully different.


Personally, I’m really stoked that I finally got around to doing this. I’m sad that I didn’t do this for the final report that I finished a couple of months ago, but I look forward to using it in subsequent reports. The only issue I have is that often things flip around in the tables and the second row (e.g., public > charter) doesn’t always work out for every row below. Sometimes public > charter but sometimes charter > public. I’m not entirely sure how I’d reconcile that. Perhaps I could use a different color to show that the relationship is vice-versa. What are your thoughts?

Confessions of a QUANT

I have a confession to make. I am a QUANT.

By a QUANT, I mean that I am good at quantitative methods and, because I’m good at them, I tend to gravitate to them. I simply prefer doing quantitative studies. I conduct mostly mixed methods studies, but I tend to leave the qualitative work to the experts. I don’t think I would say I’m post-positivist and like to say I’m pragmatist (I am constantly reflecting on whether this is true for me), but it is clear to me that I lean more towards quantitative methods.

I confess to this because being a QUANT used to make me feel guilty. I have always understood and respected the power of qualitative work, but because I am much better at quantitative work I have never really worked at improving my skills in qualitative work. I would go so far to say that I am a very poor qualitative evaluator. It’s a lot easier to hire a qualitative expert than try to do it myself.

I am not sure being a QUANT is entirely my fault. My graduate school requires a full year of statistics our first year (intermediate stats, ANOVA, regression, and categorical data analysis) and has offered numerous extra statistics courses beyond that which I ate up and even TA’d for (e.g., multivariate statistics, factor analysis, SEM, MLM, IRT). On the other hand, they only have one qualitative class (which has only been offered three times during my time at the school and I never had the opportunity to take it) and a new mixed methods class (which I did take, but the class did not teach qualitative methods).

Despite being a QUANT, I am going to try to improve my qualitative skills while I am still in graduate school. My dissertation is going to be a sequential explanatory mixed methods design, with a whopping forty interviews at the second stage. I am leading a project that is heavily qualitative and has required me to learn better coding strategies and how to calculate interrater reliability. I’m doing more of the qualitative work in my evaluations rather than leaving it to the experts.

It’s pained me to be so unknowledgeable about a topic that I should know more about, which is why I am committing to read Michael Quinn Patton’s book on Qualitative Research and Evaluation Methods (4th Ed) while my fellow students take the qualitative class in school. I’m two chapters in and we’re having great discussions on it. Stay tuned for a post in a month or two for my reflections on this endeavor.

Does This Logic Model Make My Program Look Good?

Over the past several years, data visualization has taken the evaluation community by storm. Today, there are dozens of blogs and online resources to help evaluators hop on the #dataviz train and communicate findings more effectively. The start of a new year is the perfect time to adopt new data visualization trends and apply them to your practice. However, before you jump on the bandwagon, it is worth testing assumptions about what works and what does not. That’s why we at the Claremont Evaluation Center decided to study the effectiveness of data visualization principles applied to logic models.

Read More

2017 in Review — and Looking Forward to 2018

Image credit: Vladimir Kudinov

This year, instead of yearly goals which are too long-term and fluffy to really mean anything or actually be accomplished, I focused on quarterly goals. This was long-term enough that I didn’t feel rushed to do something in a month but short-term enough that they were nearly all accomplished. For the first three quarters, I accomplished each goal I set out to do. The fourth quarter I was way too ambitious, but I still managed to get a lot done over the year:

  • Finished my thesis and submitted for publication. It was rejected and asked to resubmit once I revised per their recommendations. I never got around to that and instead posted a pre-print on Thesis Commons.
  • Submitted a second paper to a journal. It was rejected, and I have since revised and submitted to another journal. It’s awaiting peer review right now!
  • Submitted a third paper to a journal. Unfortunately, after a three month wait, the editor informed us it didn’t fit their journal. I’m a bit bitter at that one, but my collaborator and I are slowly revising to submit to another journal. It required a complete workover, so it is taking us a long time, but it’s nearly there!
  • Presented two sessions at AEA! One was on using vignettes to teach staff about program quality and conducting observations. The other was on how to survey children effectively. The second one especially was a big success and I’m very proud of it. I hope to turn it into a white paper someday! I also chaired my first two sessions ever and hope to continue that in subsequent conferences. You can view the slides and/or resources of these sessions (and all my other presentations, evaluations, and research publications) here.
  • Presented at AERA and was accepted to present at next AERA. I’m really excited about what I will present at AERA18 as it is a beautiful SEM model that tests the Farrington model of non-cognitive factors and how they relate to academic performance.
  • Found a consulting job working remotely as a statistics (and eventually as an evaluation) consultant. Funny enough, I spent weeks figuring out a plan to email a bunch of people asking to work for them remotely for an internship requirement at our school. Right when I was about to do it, my advisor said, “Dana, one of my clients needs a stats consultant and you’d be perfect for them!” Never underestimate the power of networking!
  • Was a guest on a podcast! Actually, two podcasts. I was on Eval Cafe with Carolyn Camman and Brian Hoessler where we talked about Twitter for evaluators. Then a couple days later I was asked by my good friend Ryan Budds to be on his Trivia with Budds podcast where my husband and I competed head-to-head on the topics of psychology and Its Always Sunny in Philadelphia.
  • Finished a bunch of evaluation reports. I’m continually working to refine my data visualization and report formatting skills. It’s getting better. This year I really focused on white space. In the past, I found myself sacrificing white space to get under 25 pages. This year, I’ve really tried to break out reports into sections as data comes in that allows me to answer evaluation questions. This helps a lot, but I’m proud that one of our projects we were able to chunk out the evaluation reports more explicitly in the proposal. I’m really excited to one day be the PI and not just the project manager!
  • Figured out my dissertation topic. I wrote about some of the basic topics incorporated into my dissertation previously: the differences between evaluation and research and why evaluators might be better equipped to work with practitioners than researchers. The topic has shifted slightly, but it essentially will operationalize partnership relationships and examine the effect those have on practitioner use of evidence. We say a lot how much relationships matter, but what do they look like? How can we teach budding evaluators to have high quality relationships with their clients? And do they really matter in promoting evidence use, or is it more the involvement of practitioners in the process? I’m particularly excited because we are going to be applying for a research grant to study this, in which case I’ll get paid to do my dissertation!

Looking forward to 2018

Many of my goals for the first quarter of 2018 are remnants of Q4 goals that I didn’t really finish.

  • Finish and submit publications. I have about four papers in the works, all with collaborators. They are taking a long time to finish (partly because everyone is equally busy and these are low in priority) but I am confident that we can get them completed. Two are remnants from last year’s goals but two others are papers that have been really fast-paced thanks to a highly motivated PI leading the projects.
  • Finish my PhD portfolio and become a doctoral candidate. This means finalizing my review paper (e.g., the first chapter of my dissertation), finishing my internship hours, and taking orals. Then I’ll be ABD (all but dissertation) and considered a doctoral candidate instead of a doctoral student! My dissertation proposal will probably be signed off relatively quickly as well, especially if I get this research grant *fingers crossed*
  • Post blog posts monthly. I feel like I have really slacked on blog posts this year, particularly towards the end of the year (although, funny enough I look back and I have 23 blog posts from 2017…). Still, I should have taken the advice I’ve heard multiple times and had six months of content before getting started. This would have reduced the stress of thinking I needed to get another blog post out! My new goal is to post a new blog post on the first Monday of each month. With this post, I am 1/12 towards my goal!

Most of all, I look forward to networking more on Twitter with my fellow evaluators, continuing to refine my evaluation practice, and working towards graduation!

Dana presents at Eval17: Surveying children, using vignettes to train staff, and more!

I am really looking forward to meeting you all at the annual AEA conference, Eval17! I wanted to share with you the details of my various presentations and hope you can make it to any of the ones that pique your interest! Continue reading “Dana presents at Eval17: Surveying children, using vignettes to train staff, and more!”

Can evaluators be the bridge in the research-practice gap?

Researchers and practitioners agree that there is a gap between research (or theory) and practice. While the reasons for this gap are plentiful, they boil down to researchers and practitioners comprising two communities (Caplan, 1979) such that have different languages, values, reward systems, and priorities. The two communities try to bridge the gap through a variety of methods including producer-push models (e.g., knowledge transfer, knowledge translation, dissemination, applied research, interdisciplinary scholarship), user-pull models (e.g., evidence-based practice, practitioner inquiry, action research), and exchange models (e.g., research-practice partnerships and collaboratives, knowledge brokers, intermediaries). However, these methods typically focus on researchers or practitioners and do not consider other scholars that could fill this role. Continue reading “Can evaluators be the bridge in the research-practice gap?”

Evaluation is Not Applied Research

What is the difference between evaluation and research, especially applied research? For some, they are one and the same. Evaluation and research use the same methods, write the same types of reports, and come to the same conclusions. Evaluation is often described as applied research. For instance, here are some recent quotes describing what evaluation is: “Evaluation is applied research that aims to assess the worth of a service.” (Barker, Pistrang, & Elliott, 2016). “Program evaluation is applied research that asks practical questions and is performed in real-life situations.” (Hackbarth & Gall, 2005), and the current editor of the American Journal of Evaluation saying that “evaluation is applied research.” (Rallis, 2014). This is confusing for introductory evaluation students, particularly those coming from a research background or studying evaluation at a research institution. Continue reading “Evaluation is Not Applied Research”

How evaluators adapt their evaluations to the developmental context: Evaluation design

What evaluation design is best? This debate has raged through the field of evaluation on what constitutes credible evidence[1] with some arguing for RCTs as the “gold standard” and others questioning the superiority of the RCT.

This debate is somewhat meaningless when we understand that the evaluation design is chosen based on the evaluation questions. Evaluations seeking outcomes or impact are perhaps best served by an experimental (i.e., RCT) or quasi-experimental design whereas evaluations seeking the needs of the program and fidelity of implementation are better served by a descriptive (e.g., case study, observational) or correlation (e.g., cohort study, cross-sectional study) design. Continue reading “How evaluators adapt their evaluations to the developmental context: Evaluation design”