Aug 28, 2017

Miscalibrated and Overconfident

In an editorial titled Calibrating how doctors think and seek information to minimise errors in diagnosis, Meyer and Sing describe how miscalibration and over/underconfidence can sabotage the diagnostic process,

Within the confines of decision-making uncertainty, one concept we described previously12 which we believe could have an important role to play is lack of calibration, that is, when physicians’ confidence in the accuracy of their decisions is not properly aligned with their actual accuracy.12 We posit that information-gathering failures often relate to miscalibration of physicians’ thinking processes and that this miscalibration could be a major reason for diagnostic errors and adverse patient outcomes.


Miscalibration can manifest as either overconfidence, where confidence is higher than it should be given performance, or as underconfidence, where confidence is lower than it should be given performance. In the realm of diagnostic error, overconfidence has been touted as the more insidious problem.14 However, both forms of miscalibration can adversely affect information gathering and thus diagnostic decision making. For example, when underconfident, physicians might collect too much data and over investigate through unnecessary tests or referrals, leading to increased healthcare costs.

When physicians are miscalibrated about their decision-making processes in the overconfident direction, they lack awareness about the need to (1) continue seeking information or (2) obtain the help they need to ultimately make correct diagnoses for patients.12 Both of these situations are associated with diagnostic errors. Thus, when overconfident physicians arrive at (incorrect) diagnoses, they might ‘stop short’ or fall prey to premature closure in the search for information or for explanations to patients’ health problems. Specifically, when overconfident, physicians might curtail questioning about patients’ histories or symptoms, they might stop seeking additional information from patients’ health records and they might order fewer tests in a failure to properly disconfirm or examine competing hypotheses. In a previous study, we found that even as additional diagnostic information became available (eg, new test results), physicians tended to stay overconfident and not change their diagnoses, suggesting that they may not use the additional information to improve their calibration or diagnostic accuracy.12

Miscalibration as overconfidence may also be a precursor to failure to seek additional diagnostic help from text resources, search engines, computerised diagnostic decision support and reaching out to other physicians through either informal consultations or referrals.12 Despite the availability of these resources, the potential for these resources is often not achieved, partly because of workflow issues15 or perceptions of looking incompetent in front of patients.16

Daniel Kahneman reminds us of how much society values overconfidence over uncertainty.

Overconfidence has been one of the most debated aspects of judgment and decision making between Kahneman and Gigerenzer over the years. In an article from 1996 in response to Gigerenzer, Kahneman and Tversky write,

In the calibration paradigm, subjects answer multiple-choice questions and state their probability, or confidence, that they have selected the correct answer to each question. The subjects in these experiments are normally instructed to use the probability scale so that their stated confidence will match their expected accuracy. Nevertheless, these studies often report that confidence exceeds accuracy. For example, when subjects express 90% confidence, they may be correct only about 75% of the time (for reviews, see Keren, 1991; Lichtenstein, Fischhoff, & Phillips, 1982;Yates, 1990). Overconfidence is prevalent but not universal: It is generally eliminated and even reversed for very easy items. This phenomenon, called the difficulty effect, is an expected consequence of the definition of Overconfidence as the difference between mean confidence and overall accuracy.

Consistent with his agnostic normative stance, Gigerenzer argues that overconfidence should not be viewed as a bias because judgments of confidence are meaningless to a frequentist. This argument overlooks the fact that in most experiments the subjects were explicitly instructed to match their stated confidence to their expected accuracy. The presence of overconfidence therefore indicates that the subjects committed at least one of the following errors: (a) overly optimistic expectation or (b) a failure to use the scale as instructed. Proper use of the probability scale is important because this scale is commonly used for communication. A patient who is informed by his surgeon that she is 99% confident in his complete recovery may be justifiably upset to learn that when the surgeon expresses that level of confidence, she is actually correct only 75% of the time. Furthermore, we suggest that both surgeon and patient are likely to agree that such a calibration failure is undesirable, rather than dismiss the discrepancy between confidence and accuracy on the ground that "to compare the two means comparing apples and oranges" (Gigerenzer, 1991, p. 88).

Gigerenzer's descriptive argument consists of two points. First, he attributes overconfidence to a biased selection of items from a domain and predicts that overconfidence will vanish when items are randomly selected from a natural reference class. Second, he argues that overconfidence disappears when people assess relative frequency rather than subjective probability.


Subjective judgments of probability are important because action is often based on beliefs regarding single events. The decisions of whether or not to buy a particular stock, undergo a medical operation, or go to court depend on the degree to which the decision maker believes that the stock will go up, the operation will be successful, or the court will decide in her favor. Such events cannot be generally treated as a random sample from some reference population, and their judged probability cannot be reduced to a frequency count. Studies of frequency estimates are unlikely to illuminate the processes that underlie such judgments. The view that "both single-case and frequency judgments are explained by learned frequencies (probability cues), albeit by frequencies that relate to different reference classes" (Gigerenzer, 1991, p. 106) appears far too restrictive for a general treatment of judgment under uncertainty. First, this treatment does not apply to events that are unique for the individual and therefore excludes some of the most important evidential and decision problems in people's lives. Second, it ignores the role of similarity, analogy, association, and causality. There is far more to inductive reasoning and judgment under uncertainty than the retrieval of learned frequencies.

In their 1999 article Unskilled and Unaware of it..., Kruger and Dunning write about overconfidence, miscalibration, and how their results relate to Gigerenzer's work,

The finding that people systematically overestimate their ability and performance calls to mind other work on calibration in which people make a prediction and estimate the likelihood that the prediction will prove correct. Consistently, the confidence with which people make their predictions far exceeds their accuracy rates (e.g., Dunning, Griffin, Milojkovic, & Ross, 1990; Vallone, Griffin, Lin, & Ross, 1990; Lichtenstein, Fischhoff, & Phillips, 1982).

Our data both complement and extend this work. In particular, work on overconfidence has shown that people are more miscalibrated when they face difficult tasks, ones for which they fail to possess the requisite knowledge, than they are for easy tasks, ones for which they do possess that knowledge (Lichtenstein & Fischhoff, 1977). Our work replicates this point not by looking at properties of the task but at properties of the person. Whether the task is difficult because of the nature of the task or because the person is unskilled, the end result is a large degree of overconfidence.

Our data also provide an empirical rebuttal to a critique that has been leveled at past work on overconfidence. Gigerenzer (1991) and his colleagues (Gigerenzer, Hoffrage, & Kleinbölting, 1991) have argued that the types of probability estimates used in traditional overconfidence work—namely, those concerning the occurrence of single events—are fundamentally flawed. According to the critique, probabilities do not apply to single events but only to multiple ones. As a consequence, if people make probability estimates in more appropriate contexts (such as by estimating the total number of test items answered correctly), "cognitive illusions" such as overconfidence disappear. Our results call this critique into question. Across the three studies in which we have relevant data, participants consistently overestimated the number of items they had answered correctly, Z = 4.94, p < .0001.

These are interesting findings and stress the importance of understanding cognitive bias, probability, and uncertainty in decision making.

No comments:

Post a Comment

1. You should attempt to re-express your target’s position so clearly, vividly, and fairly that your target says, “Thanks, I wish I’d thought of putting it that way.
2. You should list any points of agreement (especially if they are not matters of general or widespread agreement).
3. You should mention anything you have learned from your target.
4. Only then are you permitted to say so much as a word of rebuttal or criticism.
Daniel Dennett, Intuition pumps and other tools for thinking.

Valid criticism is doing you a favor. - Carl Sagan