It seems to me that, while we can proceed to design complex generalizability studies with variance components all over the place that look more scientific than just about anything else in medical education, in the end none of this constitutes critical evidence for theory-testing and decision-making. We are guilty of institutionalizing confirmation bias— every study we conduct, regardless of the findings, can be interpreted either as supporting the validity of the measure, or non-contributory. As a result, we don’t accumulate sufficient evidence to ‘‘falsify the hypothesis’’ and to claim that the instrument is not valid and should be abandoned.
Lack of replication
When is the last time you read an article in a medical education journal describing a replication study? I do my best to keep up with the latest developments in our field, and I personally can’t remember one such article, aside from the occasional survey revalidation piece. Given that replication is generally believed to be a critical aspect of good science, this lack of replication in medical education is a little disconcerting.
Why should medical education researchers care about replication? In theory, new knowledge is considered credible (and valid) only after the studies that produced it have been reproduced, independently, by more than one researcher. Otherwise, the chances of finding a false positive are simply too high and we risk filling the medical education literature with claims that may or may not be true, yet are generally accepted as gospel.
Questionable research practices
Questionable research practices are prevalent in the social sciences, and medical education is not immune to these problems. Although data fabrication constitutes the extreme end of a continuum, there is evidence that other questionable practices are rampant. Examples of such practices include reporting only results that align with one's hypotheses (“cherry picking”), relaxing statistical significant thresholds to fit results, using 1-sided t tests but failing to mention this in the research report, and wrongly rounding P values upward or downward to fit with a hypothesis (eg, reporting P = .04, when the actual P value is .049).
Another popular yet questionable practice is fishing, which refers to mining data for statistically significant findings that do not stem from prespecified hypotheses. Fishing increases type 1 error rates and artificially inflates statistical significance. Indeed, it would be a sin to restructure an entire study around findings from a fishing expedition, especially since these findings are more likely to be a product of chance than the result of actual differences in the population. Although findings based on fishing expeditions and other questionable practices generally work to the advantage of the researcher (ie, they improve the chances of reaching a statistically significant result and getting published), they ultimately hurt rather than advance knowledge.
Questionable research programme
QRPs often are not the result of an unethical researcher looking to make his or her scholarly mark on the field. Instead, like most complex social phenomena, irresponsible research conduct occurs from the multifactorial effects of both personal factors (e.g., knowledge, beliefs, attitudes) and environmental or contextual factors (e.g., social norms, power dynamics, intuitional policies). Thus, there is a need for both individual and systemic approaches to fostering RCR [responsible conduct of research]. Discussions about and efforts to improve RCR have been increasingly promoted in biomedicine and psychology. However, on the basis of our experiences as medical education researchers, journal editors, and faculty members in graduate programs, RCR efforts in medical education have been less robust, often focusing solely on the individual, as opposed to focusing on the individual functioning in a complex learning and health care environment.
The defensibility is not scientific or based on science, it's based on faith.
Tweet to @jvrbntz