Experts All the Way Down

Experts All the Way Down

Mini Teaser: Whether it's global warming, racism or deficit spending, beware of the experts you're listening to. They know far less than they claim.

by Author(s): Philip E. Tetlock

David H. Freedman, Wrong: Why Experts* Keep Failing Us—And How to Know When Not to Trust Them (New York: Little, Brown and Co., 2010), 304 pp., $25.99.

Kathryn Schulz, Being Wrong: Adventures in the Margin of Error (New York: Ecco, 2010), 416 pp., $26.99.

Charles Seife, Proofiness: The Dark Arts of Mathematical Deception (New York: Viking Adult, 2010), 304 pp., $25.95.

[amazon 0316023787 full]CHAFE THOUGH we might at our helplessness, even the most abrasive cynics cannot avoid relying on various species of experts—economists or oncologists or climatologists or . . .—in forming opinions. The world is just too complex to decode it on our own. And as the world has grown evermore complicated, there has been a corresponding surge in demand for metaexpertise—for those who promise to guide us through the intellectual labyrinth, distinguishing the snake-oil salesmen from the fact profferers. Of course, just because we want something does not mean that it exists. But it is a good bet that people will pop up claiming they can deliver the answers gift wrapped tomorrow morning.

 

THANK GOODNESS, it would seem then, that there are three new books by accomplished journalists David Freedman, Kathryn Schulz and Charles Seife to help us laymen figure out who is engaging in trickery-by-data-distortion and who not, so we can all better decode whom and what to trust.

I am often pigeonholed as an “expert on experts” because I have a long-standing interest in the (rather large) gaps between the confidence of political pundits and their forecasting accuracy. In reviewing these books by experts on experts, one could say I have morphed into an even-higher life form: an expert on “experts on experts.”

Before ascending further on the Great Ladder of Being (all it would take is for a blogger to review this review—and for me to reply—to move up to Rung Number Five), I had better stop the infinite regress. Why not start by asking: How much traction can we get from heeding these authors’ advice in coping with that recurring challenge of modern existence—separating the wheat from the chaff?

Each author offers what appear to be sensible, sometimes deeply insightful, guidelines. Seife warns us in Proofiness about how easily we can be seduced by an assortment of specious statistical arguments—and about the hazards of Potemkin numbers that give weak arguments an aura of credibility. He notes especially that we are never more vulnerable to this sort of “proofiness” than when false claims of precision cut along convenient ideological lines. In essence, when experts’ sureties affirm our own beliefs (Democratic, Republican or otherwise), we are quick to take them on faith.

But we should be wary of almost all “facts” delivered to us (whether they jibe with our vision of reality or not), for as David Freedman documents in Wrong, experts know a lot less than they claim—and that this is, as Marxists were fond of saying, no accident. There are such powerful and perverse institutional incentives for experts to overclaim the validity of their data and their conclusions, we should not be shocked that many ambitious scientists succumb to the I-have-figured-out-all-the-answers temptation (indeed, the surprising thing is perhaps that so many resist the siren calls of media acclaim).

[amazon 0670022160 full]And Schulz explores in her own subtly seductive way the experience of “Being Wrong”; why it is so hard to admit error but how, with the help of brilliant philosophers and clever perceptual illusions, we might learn to stop taking ourselves so damned seriously and embrace William James’s liberating insight: “Our errors are surely not such awfully solemn things. In a world where we are so certain to incur them in spite of all our caution, a certain lightness of heart seems healthier than this excessive nervousness on their behalf.”

It is hard to argue with writers as attractively open-minded as this threesome—and I caught only a few errors, none consequential. And certainly if we all internalized the authors’ collective wisdom we would be quite a bit better off. If we mastered the rudiments of statistical reasoning, recognized that experts are all too human and cultivated the fine art of self-interrogation, we’d reduce our exposure to some expensive mistakes.

On the personal side of the ledger, we would pay less money for dubious financial-services advice (and shift into low-transaction-cost, diversified mutual funds); we would be less easily swept up in panics over food additives or vaccinations; and, we would be less likely to gamble our lives on alternative medical treatments. Richer and healthier are not bad returns on an investment of less than $100 in books.

On the public-policy side of the ledger, we would be quicker to sense when we are being tricked by politicians who make their arguments in nominal dollars rather than inflation-adjusted ones, who imply that correlation means causation, and who insinuate that they have policy formulas that allow us to escape the tedious trade-offs that genuine domestic- and foreign-policy experts know all too well—those between consumption and investment, equality and efficiency, deterrence and reassurance.

[amazon 0061176044 full]And if all that were not reward enough, we could dodge a lot of silly watercooler arguments. For instance, Seife’s discussion of “electile dysfunction” shows how meaningless it was to get hot and bothered over who really won the Bush-Gore—or the Franken-Coleman—election. Both were, despite the media brouhaha, to put it simply, ties. Many have a hard time grasping this truth because we have not deeply internalized the elementary statistical concept of measurement error. A result can be off a little here or off a little there; a definitive result is an impossibility, no matter how many chads we count. Indeed, there are some situations in which we can do no better than tossing a coin—an option that our highest courts could have safely considered if most citizens were as statistically savvy as Charles Seife.

So, there are tangible senses in which we would be better off if we were all “smarter”—and “smarter” about what it means to be smart (it is not just processing speed and crunching power—it is a capacity for constructive self-criticism). But how feasible is it for the statistical layman? Can we truly become functioning educated skeptics armed with the sage wisdom of these authors? I see no better way to get at the answer than to ground ourselves in a real-world experiment. Can we decipher the validity of a given expert “finding”?

 

IT IS essential to choose tough test cases for this process—for judging metaexperts—cases in which, most importantly, we don’t already know the outcome and thus don’t know which experts proved to be on the right and wrong sides of history. We know in principle but in practice frequently forget how easy it is to tut-tut about expert stupidity from the lofty perch of hindsight. Many in my faculty-club bar still cannot fathom how those fools in the Bush administration managed to underconnect the dots prior to 9/11 and then overconnect the dots for Iraq’s WMD. Obviously, the barfly psychologists tell us, the politicians must have fallen prey to “inattentional blindness” prior to 9/11 (sometimes we are so focused on one task, we miss the eight-hundred-pound gorilla strolling through the crowd) and then fallen prey to “randumbness” prior to March 2003 (sometimes we see patterns that we wish or expect to see, not what is actually there).

Any fool can find fault ex post. But who can spot the faults—and fix them at acceptable cost—ex ante?

Tough test cases for metaexperts should have other properties as well. They should carry high policy stakes that bring into conflict heavyweight thinkers and researchers who make bold claims and hold prestigious posts in research universities and think tanks. At the risk of sounding intolerably elitist, it is too easy to discredit dubiously credentialed proponents of conspiracy theories. We need test cases with experts who, even if you’re ideologically inclined to blow them off, are hard to ignore. Here are four examples:

Are Paul Krugman and Joseph Stiglitz right (circa summer 2010) that we are teetering on the edge of the third great depression of modern economic times—and that it would be a monumental error to start reining in deficit spending? Or are the conservative and libertarian critics of Keynesian stimulus spending, such as Gary Becker, correct about negative multipliers and the need to focus on reincentivizing the private sector?

Who is closer to being right (circa summer 2010) about the future of Sino-American relations: James Fallows (there are lots of marital stressors but divorce is far from inevitable) or Niall Ferguson (get ready for a nasty divorce, probably sooner rather than later)?

On which sect of climate scientists should we place our multitrillion-dollar global-warming bets: The pessimists who see looming catastrophe or the optimists who see weak trends and much uncertainty about causation? Those willing to consider geoengineering solutions or those who see the risks of unintended consequences as too great?

Which psychologists and social scientists have staked out the more accurate position in the great American racism debate: The optimists who believe the survey evidence that prejudice has hit all-time lows or the pessimists who declare that surveys fail to measure unconscious forms of racism that make it extremely difficult for African Americans to be treated fairly in job markets?

 

LET’S TEST our metaexpertise by focusing on the last debate—a slightly lower-profile argument than the others but one in which the political-philosophical stakes could hardly be higher. The winners will shape the policies that employers must adopt to guarantee equality of opportunity in their workplaces. If unconscious biases are as potent and pervasive as some experts claim in law reviews, journal articles and court testimony, society may need to resort to more draconian measures to achieve equal-employment opportunity—in particular, numerical goals and quotas. In short, this debate pivots on whether the classic distinction between equality of opportunity and equality of results is sustainable.

In one corner of this debate, we find the race pessimists who include an array of prominent psychologists, social scientists and law professors (too numerous to enumerate) who make three sets of bold claims:

1. They have invented a device—they call it the Implicit Association Test (IAT)—that allows them to measure not only prejudices that people balk at acknowledging but also prejudices that they are flat-out unaware of having. The experts compare their new technique for unconscious mind reading to such revolutionary scientific breakthroughs as the development of the telescope. They declare the IAT to be a 100 percent–pure measure of prejudice. And they imply that the scientific debate is essentially over.

2. They have discovered that, although most people in early-twenty-first-century America claim to be unprejudiced on a conscious level, the IAT reveals that on an unconscious level this is very much not the case. Whereas only 10 or 15 percent of Americans endorse explicitly anti–African American sentiments, 70 or 80 percent register as biased against African Americans on an unconscious level.

3. And some even testify under oath as expert witnesses that unconscious biases will insinuate themselves into employment decisions whenever managers have “excessive discretion” in deciding whom to hire or promote—and that the only way to check such distortions is to hold managers accountable for achieving numerical goals (quotas in all but name) for the advancement of African Americans.

In the other corner, we find the relative optimists who warn of a precipitous rush to judgment about unconscious bias and of the dangers of discounting the extraordinary progress the country has made in the almost half century since the Civil Rights Act of 1964. They suspect that it does not help race relations to label the vast majority of the white population as antiblack on the basis of flimsy evidence. And they see themselves as defending scientific rigor and integrity against an onslaught of hyped-up claims (of course their detractors see them as reactionary apologists for residual racism).

The stage has now been set for the acid test: What help can our authors give us—and indeed federal judges and regulators—in becoming better appraisers of expertise? How do we figure out whether the unconscious-bias proponents have indeed made a revolutionary discovery that is now beyond reasonable scientific doubt, or whether we are looking at yet another faddish claim to fame, or whether the truth lurks somewhere in between?

 

OUR STATISTICALLY wary author of Proofiness Charles Seife is the perfect starting-off point, for he is ever concerned that experts distort their findings to mislead Joe Q. Public. He tells us to follow the logical trail from the measurement process that generates the numbers to the policy’s conclusion. We need to look out especially for data cherry-picking that disguises weaknesses to make them look like strengths.

So, on to the IAT. Everyone should now be curious, ideally equally curious, about where the numerical estimates of unconscious bias come from. And luckily the answer can be found at the IAT website which has been visited by millions of test takers—and which has over recent years issued varying but consistently high estimates of how much unconscious antiblack prejudice lurks in the American population.

Zero bias in IAT research has a precise meaning: it means recognizing flattering words linked to black faces just as fast as to white faces—and unflattering words linked to black faces just as fast as to white faces (if it helps to be evenhanded, think of the test as aimed at unconscious anti-Americanism—and measuring how fast you recognize flattering words linked to the face of Osama bin Laden versus George W. Bush). One can qualify as biased if one responds as little as one-tenth of a second faster to combinations of white-good and black-bad than to combinations of black-good and white-bad.

But a good Seife-ian worries about long, ultraspecific numbers for fuzzy concepts—and wants more than the expert’s word that a 166 millisecond response differential on a stimulus-recognition task (i.e., the IAT) qualifies one as highly racially biased. A good Seife-ian recognizes that every real-world number comes with a unit attached to it. And, in a thus-far-mysterious fashion, the experts have transformed a unit of time into a unit of prejudice. The smart question is: How do you know that zero bias on the test means zero bias in behavior?

The tacit assumption underlying the test is that people scoring zero are most likely to judge others on their merits—not on prejudice. But is that assumption justified? Can we safely glide from a 70 percent “fail” rate on the unconscious-prejudice test to the conclusion that 70 percent of the population is predisposed to discriminate against African Americans whenever they think they can get away with it?

In fact, when we crank up the microscope to the next level and explore how the actual propensity to discriminate against a group correlates with where one is located along the IAT scoring continuum, we find a messier picture than the test’s proponents imply. In some studies, the correlations run in the predicted direction (although they are not large). But in others, although the vast majority of subjects score as antiblack on the IAT, there is still, on average, a pro-black behavioral bias across a wide range of IAT scores, including at the zero point. In yet other studies, the more antiblack one’s IAT score, the more pro-black one’s behavior.

To clean up messes of this sort in complex research literatures, experts often rely on a technique known as meta-analysis which statistically summarizes and integrates the conflicting results. Suffice it to say, the meta-analyses have sparked only more controversy—but that has not stopped many experts from announcing to the world, and testifying under oath, that the debate about the IAT’s efficacy is effectively over.

 

AND YET even with all this data sifting—all this statistical skepticism—a further danger exists still: that ideological-bias problem. Common sense tells us (and if that is not enough, scientific studies reinforce the point) that liberals are likelier to resonate to pro-IAT arguments that emphasize the pervasiveness and potency of unconscious biases—and to see in these arguments compelling reasons for ramping up regulatory pressure on companies to promote African Americans. By contrast, conservatives are likelier to roll their eyes at liberal academics up to their usual mischief—and deplore pseudoscience that falsely labels as racist many fair-minded Americans and sets the policy stage for quotas.

If we are serious about de-biasing ourselves, we need to master the art of “turnabout thought experiments.” Confronted by controversial evidence, liberals and conservatives alike should learn, as a matter of course, to imagine reversing roles: Suppose that the same researchers used exactly the same standards of evidence not for identifying covert antiblack prejudice but instead for identifying covert anti-Americanism among American Muslims—the sort of national-security application likely to appeal to conservatives but make liberals nervous. And the unconscious-anti-Americanism researchers claim to find plenty of it among American Muslims. Again, common sense and science dictate that it is now the turn of the ACLU liberals to roll their eyes—and deplore reactionary pseudoscience that falsely impugns the patriotism of American Muslims.

The point of this thought experiment is to ensure that liberals as well as conservatives will be motivated skeptics, thereby checking the temptation to give a free pass to slippery science as long as only the other side’s ox is being gored.

In short, we have now gone as deep as even a devout Seife-ian would be willing to go (without being paid handsomely to continue). We have tried hard to rein in our own capacity for self-deception, the omnipresent temptation to believe what we want to be true. We have probed the meaning of zero bias in psychological measurement. And the fact is, we remain confronted with a big psychological barrier. Most citizens will economize on mental effort and accept numbers on faith if they resonate as true and come from a trusted expert.

 

DAVID FREEDMAN, however, encourages a more ruthless brand of skepticism—and some tempting shortcuts for people who don’t have the time for a PhD-level tutorial in psychometrics. It is not just that data can be manipulated. We must worry about the very incentives “experts” have for fudging their results. He builds on the rather sound premise that a disturbingly large percentage of this purportedly professional advice is flawed—and there are systematic reasons why many expert communities go offtrack. All too often, scientific journals, grant agencies and tenure committees put a premium on surprising (“counterintuitive”) findings that we discover on sober reflection are difficult to replicate. Which result is more likely to excite the pulse of liberal-leaning academic reviewers: to learn about previously hidden unconscious causes of our behavior that threaten to derail progress on civil rights; or to learn that representative-sample surveys over the last several decades have been getting it roughly right and that overt forms of prejudice that people are willing to express have been in steep decline—although significant pockets of prejudice remain?

To ask is to answer. Here comes that ideological bias again. And Freedman advises us that, when we see such incentives, we should be on the lookout for further telltale clues. He is especially wary of claims that are dramatic (claiming to have invented the psychological equivalent of the telescope qualifies—and so too does claiming that American civil-rights law needs to be rewritten to accommodate the new telescope’s discoveries); of claims that are a tad too clear-cut (devoid of qualifications about when propositions do and do not hold); claims that are doubt free (portraying findings as beyond reasonable doubt and one’s measure as 100 percent pure); claims that are universal (implying that one is tapping into powerful unconscious forces that, hitherto unbeknownst to us, drive all human behavior); claims that are palatable (claims likely to appeal to one’s favorite ideological constituencies); claims that receive a lot of positive media attention (the IAT has been widely covered in the mass media and millions have visited the website); and, claims that carry actionable implications (claims about what employers now need to do to guarantee true equality of opportunity in workplaces).

Using Freedman’s criteria, then, this is not a close call. When it comes to the IAT, virtually all of the warning lights are flashing. Whatever may be the merits of the underlying science in the peer-reviewed literature, in the public forum, the ratio of pseudoexpertise to genuine expertise is distressingly high.

 

IF WE are doomed to get it wrong so much of the time, whether because of our own statistical deficiencies or the manipulation of the experts (or our own biases), what are we to do? Kathryn Schulz to the rescue . . . ? Certainly, she adopts a kinder, gentler approach than either Seife or Freedman. She is no scold and she does not go for checklists (she is not easily reduced to PowerPoint). But she asks a lot of her readers: to back off from their own belief systems and look at themselves with a remarkable degree of philosophical detachment. Her forgiving premise is that we are all deeply flawed thinkers and we would do ourselves—and those around us—a huge favor if we would come to grips with that fact and learn to laugh at ourselves. She sees the psychological obstacles but suggests that this is no impossible dream. We all have the potential to approach error in a convivial, as opposed to ego-defensive, spirit. After all, we enjoy perceptual illusions that dumbfound us, murder mysteries that stump us, magicians who violate our basic assumptions about cause and effect, comedies of error that play misperceptions off misperceptions. Schulz is continually gently prodding us into open-mindedness and it works—but only up to a point. I fear she asks for superhuman detachment. I could easily imagine each side of a bitterly polarized argument, such as the great unconscious-racism debate, immersing themselves in this delightful book but reemerging to engage with one another just as self-righteously as before.

Indeed, in the debates I have witnessed—and participated in—each side may have a sense of humor, but it is not playful, self-critical Schulzian humor; rather, it is of the Hobbesian sort, aimed at ridiculing the other and relishing one’s moral and cognitive superiority. I know many scientists who pay homage to Karl Popper’s doctrine of “falsificationism”—the importance of being able to articulate the conditions under which you would change your mind and jettison your pet hypotheses. But the falsificationists I know hate to be falsified on any issue close to their core identity. They don’t find that prospect one bit funny. They have too much reputation at stake. The same applies to other professionals—and to none more than politicians who spend large fractions of their working lives putting positive spins on negative outcomes. We take ourselves so seriously, in large part, because we think others care whether we are right or wrong, and that they will not be as forgiving of error as is Schulz. In the end, I wonder whether Schulz would really welcome a demonstration that her “optimistic meta-induction from the history of everything” is just so much wishful thinking, rooted in a misconception that humans are far more Schulzian than they are?

Schulz’s book is also the psychologically deepest of the three—so it is easy to see roughly where she would fall in the great unconscious-racism debate. On the one hand, she would see worrisome signs of hubris in the unconscious-bias movement—and a tone of aggrieved sarcasm on both sides. On the other hand, she knows a lot about the workings of the human mind and appreciates that there is a lot of evidence that much thought is driven by subconscious processes to which we have limited or no access (a well-established proposition in my view). And, as a liberal, she should also be reflexively sympathetic to the effort to identify subtle unconscious processes that are harming the traditionally disadvantaged.

 

SO WHERE does all this leave us? After the rhetorical dust settles, it leaves us roughly where we would have been without the metaexperts: conservatives and liberals will mostly hold firm to their original reactions (chastened a bit, I hope, by the turnabout thought experiment and the authors’ warnings about motivated-reasoning biases). Conservatives will still be prone to laugh off the liberal-academic mischief makers. We have come a long way from Selma, Alabama, if we have to measure prejudice in millisecond differentials of how long it takes people to classify words as good or bad and faces as black or white. And liberals will still be prone to furrow their brows over potential ways such rapid-fire “biases” could affect important decisions. Some of the scenarios that liberals conjure up—differential eye blinking as a source of racial bias in interviews—will invite conservative ridicule. But other scenarios should give even hard-line conservatives pause. The most plausible worst-case scenario comes, in my view, from shooter-bias studies in which millisecond differentials do matter—and can cause police (black as well as white) to be quicker to shoot black suspects in identical experimental situations.

Of course, beyond ideological perceptions, there is an underlying reality. My best guess is that there are significant kernels of truth in the claims about unconscious bias. But no one yet knows the final resolution of the controversy over the much-publicized IAT—and there are good reasons to be suspicious of the claims linked to the test about the pervasiveness and potency of unconscious bias in American life and of spin-off claims about the necessity to resort to numerical goals and quotas to check such prejudice. The truth will probably be far more qualified than the original claims of the most ardent proselytizers of the IAT. To be sure, there may be some quite-artificial conditions under which the test does, to a very limited degree, allow us to identify racial discriminators who could not have been identified using explicit measures of conscious attitudes. But even that modest accomplishment will almost certainly come at a steep cost in the false-positive labeling of the fair-minded. Media hype aside, all of this hardly amounts to a clincher case for rewriting American civil-rights laws. But don’t count on banner-headline retractions.

 

IF WISHES were horses, beggars would ride. Reading these books will not transform the citizenry into thoughtful policy analysts. The start-up costs of becoming a sophisticated consumer of expertise are nontrivial. The authors try to make it look easy, but, as any bona fide expert on human cognition can tell you (trust me), it is hard to make critical-reasoning skills stick and harder still to make them transfer to new domains. We rely on experts, in part, because we don’t have the time and energy to think things through on our own.

Even more so, some promised benefits will be elusive. Behavioral game theorists (what you get when you cross a psychologist with a mathematician) teach us that there will always be a predator-prey relationship between the more and less cognitively sophisticated, regardless of whether we peg the average policy IQ at 80, 100 or 120. In this view, we are—whether we know it or not—perpetually enmeshed in an intellectual arms race with those around us. It would be nice to be as clever as Freedman, Schulz and Seife, or, for that matter, Martin Feldstein or Cass Sunstein or Larry Summers, but in the long run, we are merely upgrading to better classes of fools—and even if we were to get ourselves up to speed on all the ways in which the proverbial statistical wool can be pulled over our less knowledgeable eyes, the masters of the dark arts of pseudoexpertise deception will quickly learn to ramp up their game.

It is unrealistic to expect the average citizen to work through a long checklist of warning indicators every time they confront a new expert. People routinely and rapidly decide how to decide—and, given the harried lives we lead, the most attractive option is often that which requires the least effort. The average citizen understandably counts on experts to relieve them of the moral and intellectual burdens of thinking through each issue on his or her own. Laudable, then, though the efforts of these authors are, people will continue to rely on experts as mostly uncritically as before.

A perhaps-apocryphal story has it that Bertrand Russell, a brilliant atheist provocateur of yesteryear, was lecturing on the structure of the cosmos—how our planet orbits our sun which, in turn, orbits the center of our galaxy which, in turn . . . After he finished, a little old lady declared: “What you have told us, Sir, is rubbish. The world is a flat plate supported on the back of a giant tortoise.” Russell condescended a reply: “Well, Madam, what is the tortoise standing on?” To which the lady delivered her fatal riposte: “You’re very clever, young man, very clever. But I am afraid it’s turtles all the way down!”

Our claims to knowledge are far more precarious than we can normally stand to admit—and, on close inspection, often rest on faith in experts who have banked their faith in other experts who . . .

 

Philip E. Tetlock is the Mitchell Endowed Professor at the University of California, Berkeley, and the author of Expert Political Judgment: How Good Is It? How Can We Know? (Princeton University Press, 2005)

Image: Essay Types: Book Review