Can evaluators be the bridge in the research-practice gap?

Researchers and practitioners agree that there is a gap between research (or theory) and practice. While the reasons for this gap are plentiful, they boil down to researchers and practitioners comprising two communities (Caplan, 1979) such that have different languages, values, reward systems, and priorities. The two communities try to bridge the gap through a variety of methods including producer-push models (e.g., knowledge transfer, knowledge translation, dissemination, applied research, interdisciplinary scholarship), user-pull models (e.g., evidence-based practice, practitioner inquiry, action research), and exchange models (e.g., research-practice partnerships and collaboratives, knowledge brokers, intermediaries). However, these methods typically focus on researchers or practitioners and do not consider other scholars that could fill this role.

As I will argue in the review paper for my dissertation, evaluators are in a prime position to bridge the gap between researchers and practitioners. Evaluation has been considered a transdiscipline in that it is an essential tool in all other academic disciplines (Scriven, 2008). Evaluators use social science (and other) research methodology and often have a specific area of content expertise, enabling them to bridge the gap to researchers. Furthermore, evaluation often requires a close relationship with practitioners to create evaluations that communicate in their language, speak to their values and priorities, and meet their needs to produce a useful evaluation, enabling them to also bridge the gap to practitioners. Evaluators can use their similarities with both researchers and practitioners to span the gap between researchers and practitioners as knowledge brokers or intermediaries (see figure).

However, while evaluators may span the bridge to researchers and practitioners individually, they may not be working to bridge the gap between researchers and practitioners. In a field that still debates the paradigm wars (e.g., the “gold standard” evaluation, qualitative versus quantitative data), the role of evaluators (e.g., as an advocate for programs), core competencies for evaluators, and professionalization of the evaluation field, it is unclear to what extent evaluators see their role encompassing bridging the research-practice gap and, if so, to what extent evaluators are actually working to bridge this gap and how they are doing so.

Stay tuned as I continue blogging about the review paper for my dissertation (i.e., the first chapter of my dissertation). I would sincerely appreciate any and all comments and criticism you may have. It will only strengthen my research and hopefully aid in my ultimate goal of informing the field of evaluation and improving evaluation practice.

Evaluation is Not Applied Research

What is the difference between evaluation and research, especially applied research? For some, they are one and the same. Evaluation and research use the same methods, write the same types of reports, and come to the same conclusions. Evaluation is often described as applied research. For instance, here are some recent quotes describing what evaluation is: “Evaluation is applied research that aims to assess the worth of a service.” (Barker, Pistrang, & Elliott, 2016). “Program evaluation is applied research that asks practical questions and is performed in real-life situations.” (Hackbarth & Gall, 2005), and the current editor of the American Journal of Evaluation saying that “evaluation is applied research.” (Rallis, 2014). This is confusing for introductory evaluation students, particularly those coming from a research background or studying evaluation at a research institution.

Others claim the distinction between evaluation and (applied) research is too hard to define. I do not disagree with this point. The boundaries between evaluation and research are fuzzy in many regards. Take, for instance, evaluation methodology. Our designs and methods are largely derived from social science methodology. However, as Mathison (2008) notes in her article on the distinctions between evaluation and research, evaluation has gone much further in the types of designs and methods it uses such as significant change technique, photovoice, cluster evaluation, evaluability assessment, and success case method. Scriven and Davidson have begun discussing evaluation-specific methodology (i.e., the methods distinct to evaluation), including needs and values assessment, merit determination methods (e.g., rubrics), importance weighting methodologies, evaluative synthesis methodologies, and value-for-money analysis (Davidson, 2013). These methods show that, while we indeed incorporate social science methodology, we are more than that and have unique methods beyond that.

This is no better illustrated than by the hourglass analogy provided by John LaVelle. The differences between research and evaluation are clear at the beginning and end of each process, but when it comes to the middle (methods and analysis), they are quite similar. Thus, evaluation differs from research in a multitude of ways. The following table should be interpreted with a word of caution. The table suggests clear delineations between research and evaluation, but as Mathison notes, many of the distinctions offered (e.g., evaluation particularizes while research generalizes) are not “singularly true for either evaluation or research.” (p. 189, 2008).

Area of difference Research Evaluation
Purpose Seek to generate new knowledge to inform the research base Seek to generate knowledge for a particular program or client
Who decides Researchers Stakeholders
What questions are asked Researchers formulate their own hypotheses Evaluators answer questions that the program is concerned with
Value judgments Research is value neutral Evaluators provide a value judgment
Action setting Basic research takes place in controlled environments Evaluation takes place in an action setting where few things can be controlled
Utility Research emphasizes “production of knowledge and leaves its use to the natural processes of dissemination and application.” (Weiss, 1997) Evaluation is concerned with use from the beginning
Publication Basic research is published in journals Evaluation is rarely published, typically only stakeholders can view the reports

I want to conclude by saying that if we are to call ourselves a transdscipline or an alpha discipline, like Scriven would argue we are, then we should work hard to differentiate ourselves from other disciplines, particularly basic and applied research. This may be difficult, particularly between applied research and evaluation, but we need to make these differences as explicit as possible, partly to help incoming evaluators in the field understand the differences (see EvalTalk for this repetitive question since 1998; Mathison, 2008) and partly to separate ourselves from research (and research from evaluation).

Dealing with my first journal article rejection

It was my first journal article submission (OK, second… my first, another article, was desk rejected). This article was my thesis that I’d been working on for two years. I’d originally written it up for journal publication, so once both of my readers signed it off, I sent it off to the primary journal in the field (American Journal of Evaluation) and waited.

And waited.

Lesson learned #1: Anticipate waiting for a looong time.

This was just shy of 3 months, but it still felt much longer than I wanted. I was checking its status often, seeing when it went from received, to searching for reviewers, to waiting for reviewer’s scores, to finally the anticipated “awaiting final decision.”

And then the crushing blow: manuscript rejection. It didn’t help that it came first thing in the morning, promptly at 7am right as I woke up so I read it as I was reading my morning email in bed.

The rejection was not nearly as bad as I anticipated (if I had been anticipating a rejection, that is). I remember just being slightly disappointed and sad. But then I started reading reviewer’s comments.

Lesson learned #2: Perhaps wait to read reviewer’s comments.

I remember starting to get bitter, angry, and upset as I went through some of them. The following are some of the comments and initial reactions that went through my head, along with my more rational thoughts now that it’s been a few days:

The first comment from the first reviewer was this: “Of most concern, is the true response rate of 16% (279/1758).” My initial reaction was anger that they do not realize how difficult it is to get a good response rate from AEA evaluators, who was my sample. How could they ask me to get a better response rate?! Later, I realized that there was more to the comment: “I believe reasonable efforts to increase the sample size should be made, and if not feasible, a justification of the sample being representative of the underlying population.” Fair and reasonable!

The first two reviewers commented on my grammatical errors. For instance, one noted “there are many incomplete sentences and other minor errors throughout the manuscript.” However, then I get to the third reviewer who stated there were simply a few minor grammatical mistakes. Initially, I was upset; my grammar is impeccable and I don’t think I’ve written an incomplete sentence in quite a long time! However, I realized what was probably more likely the case: my sentences were wordy and did not flow as well as they could. While I have no idea which sentences they were referring to, perhaps sending the manuscript to a copy editor could improve the manuscript the second time around.

Lesson learned #3: Have a support system.

I was able to immediately tell my husband, good friend, and my advisor the results of the manuscript and get comforting words but also words of wisdom. My tweeple were incredibly supportive as well! My initial sadness and anger disappeared by the end of the day through the support that everyone around me provided.

Lesson Learned #4: It’s OK to be upset about it.

I did not get any work done the day the rejection came. I went to the gym, came home and thought to myself, “If my thesis—that I worked harder on than any other project in my life—was rejected, how can I possibly work on any of the other papers in the works? How can I work on my DISSERTATION when my thesis wasn’t even good enough?!” But in the end, allowing myself to wallow for a day helped the recovery process quicken. By the end of the day, my mood had lightened, I read for my dissertation, and the next day I was back to normal.

Easy Excel Template for Renaming & Relabeling Multiple Variables in SPSS

Often, I am dealing with a Qualtrics dataset with over a hundred variables. Unfortunately, Qualtrics does not do so well in renaming variables on the site, so I typically just leave the defaults and recode later in SPSS. Using Excel, I have been able to make the process very quick! This has been really helpful for myself and many of my colleagues, so I hope you can find it useful as well!

1. Download the Excel template

You can download the Excel template for recoding here. I’ve kept some of the default variables as they are typically exported in Qualtrics.

2. Get your list of variable names you need recoding

I recommend opening your dataset in SPSS and copying the list of variables in the Variable View (not the Data View) and pasting them into the first line of the Excel template, after the first row.

3. Rename your variables in Excel

In the second column “New Variable Name”, write what you would like to be the new variable name. Note that you cannot have the same name for multiple variables, variables cannot start with a number, and variables are limited to certain characters. You can read more about valid variable names here.

4. (Optional) Add new variable labels in Excel

The default variable labels are generally not that informative or written weirdly. I usually choose to relabel them. You can do that by copy-pasting the original value labels into column C “Original Variable Label” and relabeling them in column D “New Variable Label.” Note that you should not have any apostrophes in your labels or it will mess up in the Syntax!

5. Paste the syntax into SPSS.

Open a new Syntax window in SPSS and copy-paste everything from columns E and D (minus row 1) into that syntax window.

For the renaming variables, your syntax should be:

RENAME VARIABLES V1=ResponseID.
RENAME VARIABLES V2=ResponseSet.
RENAME VARIABLES V3=Name.
RENAME VARIABLES V4=ExternalDataReference.
RENAME VARIABLES V5=Email.
RENAME VARIABLES V6=IPAddress.
RENAME VARIABLES V7=Status.
RENAME VARIABLES V8=StartDate.
RENAME VARIABLES V9=EndDate.
RENAME VARIABLES V10=Finished.
RENAME VARIABLES LocationLatitude=Latitude.
RENAME VARIABLES LocationLongitude=Longitude.
RENAME VARIABLES LocationAccuracy=Accuracy.
EXECUTE.

For the relabeling variables, your syntax should be:

VARIABLE LABELS
ResponseID ‘ID of Respondent’
ResponseSet ‘Response Set’
Name ‘Name’
ExternalDataReference ‘ExternalDataReference’
Email ‘Email’
IPAddress ‘IPAddress’
Status ‘Status’
StartDate ‘StartDate’
EndDate ‘EndDate’
Finished ‘Finished’
Latitude ‘LocationLatitude’
Longitude ‘LocationLongitude’
Accuracy ‘LocationAccuracy’.

Note the period at the end of the last line.

How I Keep up with Reading and the Latest Research

As a lifelong learner, I love reading new journal articles and white papers. However, as I begin my transition to the “real world” from graduate school, I’ve found it harder and harder to stay up on the latest research.

Lately, despite being swamped with both school and work, I’ve still managed to keep up my reading routine. Here are some of my tips to do so:

Get on Twitter

I found being on Twitter one of the best ways to find new literature to read. People I follow—who are relevant to my work—post and retweet interesting articles that many times I find relevant. I’ve found new people to follow, new organizations doing great work, and articles in other fields that I would have never discovered without Twitter. Sometimes these posts are not necessarily the latest breaking work but rather popular works that I would not have found without a very specific search.

Save Everything to Your To-Do List

When I’m skimming through my Twitter feed, I usually only have a few minutes before I need to tackle my next task. I’m also usually on my phone and can’t read the tiny text! So, during those skimming moments, any relevant posts I save to my Todoist to read for later. News articles are usually read the same day, later in the evening when I’m done working and relaxing in front of the TV. I save journal articles or longer pieces to my Dropbox in a monthly “To Read” folder.

Subscribe to Journal Updates

For every journal that I typically read, I subscribe to updates to learn about the latest papers they publish. I like to save these email subscriptions to the end of the week and then go through them in a batch. This allows me to use my university library website to get access to articles behind paywalls.

Have a “To Read” Folder 

Every month, I create a new folder in my “To Read” folder. They’re named the current month and year (e.g., “7.17”). For that entire month, any longer reads that I would prefer to do on my computer or tablet. The figure shows my typical set-up for my “To Read” folder in my Dropbox. In the main folder, I have three main sub-folders: (a) monthly folders for the current month(s) of articles; (b) a set of folders of previous months’ articles that I need to add to Zotero, my reference management software; and (c) a set of folders of previous months’ articles that I have already added to Zotero. I could probably delete that last folder, but I like having them around and I am paying for the 1TB of Dropbox storage which is more than plenty for me!

Stay Caught Up on Reading

This is perhaps the trickiest part, and one I lapsed on recently. I found myself with four months’ worth of accumulated articles, probably totaling over 100 articles! I’ve since put myself on a daily reading regimen. One of my tasks every day has been to simply “Read Articles.” I have almost caught up and am currently reading June’s articles while collecting articles in July. In some ways I like the strategy of reading last month’s articles; this gives me a second chance to decide if it’s truly worth reading. If it’s not, I’m OK with just skipping that article and putting it in Zotero. I may not have read it fully, but the text is searchable and, if it was interesting and relevant enough for me to download it, it may become interesting and relevant in the future.

I also find this a very useful strategy in the literature review phase of research and evaluation projects. Currently, I am working on my dissertation so I have both “Read Articles” and “Read Dissertation” as to-do list tasks. I do both in the evening or in my spare time, and I prioritize the dissertation since I would like to graduate in a timely fashion. I did this successfully earlier when I had tons of reading I needed to do for revamping two other papers I am working on. Now I just need to apply these principles to my writing practice, too!

Why aren’t evaluators adapting their evaluations to the developmental context?

Overall, my study found that evaluators are less likely to be participatory—both in the overall evaluation process and in data collection methods—when the program beneficiaries are children than when they are adults. Why is this the case?

One possibility is that the evaluators in my study were not well-versed in working with youth. However, half of the evaluators were in the Youth Focused Evaluation TIG or the PreK-12 Educational Evaluation TIG, indicating they had some experience working with youth programs. Membership in these TIGs and self-reported developmental knowledge did not really relate to their evaluation practices.

Another possibility is that some other evaluation characteristic, such as their education level, their evaluation role (i.e., internal or external), or years of experience as an evaluator, could relate to their developmentally appropriate practice. Again, there were few differences between these evaluation characteristics in their evaluation practices.

Thus, the questions remain: which evaluators are more likely to have developmentally appropriate practice and what are the barriers to developmentally appropriate practice?

Some previous research suggests that experienced evaluators, even those experienced in working with youth, may need help in conducting developmentally appropriate evaluations. In a content analysis of youth program evaluations, Silvana Bialosiewicz (2013) found that few evaluations reported developmentally appropriate practices. That study was the impetus for the current study. A follow-up study involved interviewing youth evaluators and found many barriers to high quality youth program evaluation practice (Bialosiewicz, 2015). These barriers included cost and time needed and misconceptions from clients about good evaluation practice. Overall, this suggests that evaluators may need more training in developmentally appropriate practice or better resources for conducting developmentally appropriate youth program evaluations.

Next Steps

As with most research, I’m left with many more questions about developmentally appropriate evaluations than I was able to answer. I believe the results of the study suggest more need in examining youth participatory evaluation. However, I’m particularly interested in survey techniques with children and adolescents. I often see misunderstanding about survey methodology in general, and this is exacerbated when surveying children and adolescents. I am hoping to present at AEA 2017 on best practices in surveying children to help remedy this issue, but I also would like to further study this topic.

How evaluators adapt their evaluations to the developmental context: Evaluation methods

Knowledge about children is best obtained directly from youth using interviews, focus groups and surveys. This is in stark contrast to past commonly used methods of observations and ethnography, which were primarily used because researchers did not believe youth could provide reliable and valid data.[1]

In my study, I examined whether evaluators collected data about beneficiaries directly (i.e., interviews, focus groups, surveys) or indirectly (i.e., case studies, observations, archival data). If evaluators indicated they would collect data directly from participants, I also asked them questions about their survey or interview-specific practices.

Overall, evaluators were more likely to indirectly collect data from beneficiaries when they were children and adolescents than when they were adults. For the tutees, evaluators were less likely to survey children or conduct focus groups with children and more likely to conduct observations. Interestingly, evaluators in the child condition were also more likely to survey and conduct focus groups with tutors, as well as collect archival data (as a reminder, the tutors in this condition are adolescents).

The following are some of the interesting differences (or lack-thereof) in the survey and interview-specific methodologies. Evaluators in the child condition were…

  • more likely to have program staff administer the survey and use oral administration and less likely to use online administration.
  • less likely to have the evaluation team conduct interviews and more likely to use interview specialists.
  • more likely to have shorter interviews, fewer participants in focus groups, and focus groups comprised of participants of similar ages.
  • equally likely to use 2-4 (36%), 5-7 (63%), 8-10 (2%), or 11+ (0%) response options in the survey.[2]
  • equally likely to test for internal consistency (62%), test-retest reliability (42%), face validity (70%), criterion validity (32%), construct validity (35%), use factor analysis techniques (52%), or test for moderators (35%).
  • equally likely to use unstructured (0%), semi-structured (92%), or structured (8%) interviews.[3]

[1] Punch, S. (2002). Research with children: The same or different from research with adults? Childhood, 9(3), 321–341.

[2] There were likely no differences due to a floor effect of response options typically used. The response options could be examined in a future study be examining each number between 2-8 individually instead of clustered into categories to avoid this floor effect.

[3] Evaluators overwhelmingly preferred semi-structured interviews regardless of the age of participants.

How evaluators adapt their evaluations to the developmental context: Evaluation design

What evaluation design is best? This debate has raged through the field of evaluation on what constitutes credible evidence[1] with some arguing for RCTs as the “gold standard” and others questioning the superiority of the RCT.

This debate is somewhat meaningless when we understand that the evaluation design is chosen based on the evaluation questions. Evaluations seeking outcomes or impact are perhaps best served by an experimental (i.e., RCT) or quasi-experimental design whereas evaluations seeking the needs of the program and fidelity of implementation are better served by a descriptive (e.g., case study, observational) or correlation (e.g., cohort study, cross-sectional study) design.

In the context of youth programs, however, longitudinal designs may be particularly important. Longitudinal designs are critical for measuring and understanding development over time. They’re especially critical when knowledge of long-term effects, that may not manifest until the end of the program or after services have ended, is needed.

In my study, evaluators did not change their evaluation designs based on the age of participants. I asked evaluators to rank their choice of evaluation design and majority chose quasi-experimental (37%), descriptive/correlation (23%), or experimental (15%) as their primary choice. Few evaluators chose a case study (8%) or ethnographic (4%) design. A further 13% evaluators chose to write in another design, with majority indicating a mixed methods design.

I also asked evaluators how many waves of survey or interview data collection they would do across the three years of the evaluation. For those who responded to survey questions, 69% said they would do a baseline and multiple follow-up surveys, 28% said they would do a baseline and one follow-up, and only 3% said they would only do a baseline or post-test survey. For those who responded to interview questions, 93% said they would do multiple sets of interviews or focus groups and only 7% said they would do only one set. However, there are likely no differences because of the length of the simulation study’s evaluation of three years.

Be sure to check out the previous post on how the evaluation approach differed across age conditions. Stay tuned for more results from my study in terms of the evaluation methods, as well as a discussion explaining these results and next steps!

[1]  Donaldson, S. I., Christie, C. A., & Mark, M. M. (Eds.) (2009). What counts as credible evidence in applied research and evaluation practice? Los Angeles, CA: Sage.

How evaluators adapt their evaluations to the developmental context: Evaluation approach

As mentioned previously, developmentally appropriate evaluation requires a culturally appropriate evaluation in the context of youth programs. This means including youth, or at minimum those with knowledge and experience working with youth, in the evaluation.

In my study, I asked practicing evaluators to describe the levels of involvement there would be across a wide range of stakeholders including school administrators, teachers, parents, program staff, program designers, district personnel, funders, developmental consultants, math consultants, and the tutors and tutees of the program. In particular, I was interested in the levels of involvement of the consultants, the tutors, and the tutees across the evaluators randomly assigned to the child, adolescent, or adult conditions.

Overall, evaluators were less likely to involve tutees in the child condition than the adolescent condition, and evaluators in both conditions were less likely to involve tutees than evaluators in the adult condition. Evaluators were also less likely to involve tutors in the child condition (as a reminder, the tutors in this condition are adolescents) than evaluators in the adult condition. There were no differences in use of consultants across the conditions.

One could argue that some evaluators have the knowledge and expertise required to conduct a culturally appropriate youth program evaluation. Thus, I also examined the extent to which their knowledge and expertise moderated the differences. Evaluators in the Youth Focused Evaluation TIG (a TIG focused on youth participatory evaluation) were most likely to involve beneficiaries than non-youth evaluation TIG members and members in the PreK-12 TIG were least likely to involve beneficiaries. Furthermore, evaluators with more self-reported developmental expertise were less likely to involve beneficiaries.

These results suggest that evaluators are less likely to involve beneficiaries of the program when they are children and adolescents than when they are adults. Evaluators were exposed to the same exact program, with the only difference being the age of beneficiaries.

Stay tuned for more results from my study in terms of the evaluation design and evaluation methods, as well as a discussion explaining these results and next steps!

Can evaluators remain unbiased?

Bias is the prejudice in favor of something, usually in a way considered unfair.

People have been talking about bias in evaluation—and research—since the beginning. It is the idea that if a person is favorable towards a program or perhaps wants to receive future contracts from that program, they are more likely to adjust their evaluations in a way that can lead to findings that are invalid, unreliable, and lack credibility.

Understandably, some people have thought this level of bias should be avoided at all costs, and propose maximizing the distance between the evaluator and the program to achieve this goal. These methods to maximize the distance include goal-free evaluation, non-participatory evaluations (e.g., goal-free evaluation) independent funding (rather than the program funding the evaluation), designs that minimize threats to internal validity (a la Campbell, including randomized control trials), and peer reviews and meta-evaluation.

However, some of these approaches to evaluation—where the program and its stakeholders should be minimized to avoid contaminating the evaluator—alienates many evaluators, particularly those who report being in an internal role. Scriven and Campbell, some of the major proponents of minimizing bias, may not have the pragmatic or constructive epistemologies that many evaluators have. Thus, Scriven and Campbell argue for controlling for bias rather than acknowledging and recognizing bias like a constructivist would argue for.

Furthermore, this approach to evaluation alienates our stakeholders. Proponents of collaborative, empowerment, and participatory evaluations argue, and have found much evidence for, the benefits of such an approach for stakeholders, the program, and the evaluation. These benefits include giving participants ownership over the evaluation and results; building capacity to understand, use, and conduct evaluations; and improving program performance, learning, and growth.

Minimizing or controlling for bias may alienate evaluators and stakeholders who want a more participatory approach. 

There is a level of bias that some seem to consider the worst type of bias: advocacy. As soon as evaluators put on their advocacy hat, they are no longer value neutral toward the program. Rakesh Mohan wrote a wonderful piece on Sheila Robinson’s website titled “Why are evaluators so tentative about the advocacy aspect of our profession?[1] In it, he argues that “it is the fear of politics that makes many evaluators tentative about advocacy.”

He further argues in his related AJE article[2] that advocacy while maintaining independence is a difficulty and risky endeavor. Credibility is important in our profession, particularly outside of Canada without any sort of credential or professionalization system. As such, “the loss of credibility could adversely affect the evaluator’s professional reputation among peers and could negatively affect his or her livelihood. It is the fear of losing one’s credibility that keeps many evaluators away from engaging in advocacy activities.” (Mohan, 2014, p. 400).

Many are fearful of losing credibility if they are viewed as biased in any regard.

I think this loss of credibility—in the eyes of peer evaluators, stakeholders, and the outside community—is what most people think of when they think of, and fear for, bias. And I am not saying it is wrong to be fearful of it or wrong to avoid it necessarily. However, I think we need to balance credibility with participatory (or empowerment or collaborative) evaluations whereby we advocate for our programs when it is ethical to do so. We are often working with impoverished or disenfranchised communities or programs that have immense political implications. Through advocacy—or even just by maintaining closeness to these communities in our evaluation—we can help raise their voices in this highly political world.

[1] Rakesh’s blog and article are both focused primarily on advocating for evaluation, not necessarily for the programs. However, I feel his arguments are relevant to both cases.

[2] Mohan, R. (2014). Evaluator advocacy: It is all in a day’s work. American Journal of Evaluation, 35(3), 397-403.