On Causality

For the last twelve years, I've dealt with causality issues frequently. Microsoft runs Surface sales at back-to-school season and the holidays, the two times when demand is highest. Any attempt to estimate demand -- necessary to improve revenue via pricing -- necessarily confounds the timing of Microsoft's sales and its prices. If Microsoft lowers a price in, say, March, what will happen? That is, what is the causal effect of the price change, disentangled from the season effect?   For a second example, spam filters involve two tradeoffs -- one between letting spam through and classifying legitimate mail as spam, and a second between investment in filtering and spam-clearing effectiveness. Optimizing both tradeoffs requires assessing how spam causes harm to the user. But experiments -- exposing people to more spam to see how they react -- create a terrible user experience. And the data is challenging because the kind of person who gets a lot of spam behaves differently than the person who doesn't, so it is challenging to disentangle effects of spam from other aspects of the person's behavior. 

In 1875, W.S. Jevons hypothesized that sunspots might cause recessions.  He provided a plausible mechanism: sunspots influence agricultural output, which was large enough to tip an economy into recession. We now know that the correlation of sunspots with corn prices and other agricultural output was spurious. Nonetheless, because recessions themselves are likely a self-fulfilling prophecy, spurious correlation might induce a lasting relationship. We see this today in the discussion of the inverted yield curve, with the bulls (optimists) insisting that 'this time is different, it doesn't signal what it used to signal,' while the bears (pessimists) are preparing for recession. Preparing for recession involves reducing hiring, reducing capital expenditures, and tightening belts, that is, engaging in activities that cause recessions. Even if the bulls are right that an inverted yield curve need not lead to recession, the behavior of the bears may be enough to insure that an inverted yield curve leads to a recession.

 Economics faces thorny problems with causality for three reasons. First, we have a limited ability to do experiments about things that actually matter. Second, the main causality principle that works in physical sciences just doesn't work in economics. In the physical sciences, events that happen later cannot cause events that happen earlier: causality flows downstream in time. In contrast, future events can cause earlier events in economic data and regularly do, through the mechanism of beliefs. If enough people believe the Federal Reserve will raise interest rates too much, or the government will increase tariffs or follow other harmful policies, people take action today. Thus the future event (interest rate or tariff increase) causes actions that happen earlier. (Strictly speaking, an increase in the likelihood of a future event has repercussions that happen earlier. If the event were completely unforeseen, it wouldn't cause earlier events.) Third, economics faces a great deal of heterogeneity, not compared to other social sciences, but compared to the physical sciences. Electrons appear to be identical to each other, while people vary greatly, even controlling for observable traits. As a consequence of these problems, economists have invested heavily in trying to tease out causation from correlation.

 Herbert Simon won the Nobel Prize in economics in part because of his analysis of causality in an influential book, Models of Man, published in 1959. Simon provided a mathematical analysis that argued that causality could not be extracted from data alone; researchers must bring some structure to the analysis of data. Some structure on the data must be imposed (e.g. certain variables are exogenously determined, or one variable could cause another but not vice versa) if anything is to be learned. Otherwise, all the data can possibly generate is correlation, not causation. Simon's concept is presently known as an identifying restriction and economists tend to be very clear about the identifying restrictions needed to justify data-driven conclusions, so that a reader of a study can decide whether the assumptions are plausible.

 An early example of the problem of causation arose in agricultural economics. If you look at the effect of temperature on corn output, you find an inverse relationship: higher temperatures lower corn output. But this finding is incorrect, and the problem is rain. Rain is good for corn, increasing yields, but tends to lower temperatures. Holding the amount of rain fixed, increasing the temperature is good for corn (with some limits). Looking only at the effect of temperature confounds the direct effect of temperature with the amount of water provided to the corn. Rain is sometimes referred to as a "third cause," meaning something that both matters and is correlated with other explanatory variables. The presence of third causes that are not accounted for lead to incorrect conclusions. 

The best way to deal with causation, when possible, is with an experiment. With corn, we could irrigate the corn and grow it in greenhouses, thereby fixing the amount of water that the corn gets. Then by varying the temperature, we can find what the effect of temperature is on corn. But as noted above, experiments are often not so easy to carry out. Consider trying to assess the value in future income of a college education. The challenge is that people choose whether to go to college, and the kind of people who choose to go to college are different than the kind that don't. In particular, on average people who go to college come from wealthier backgrounds, they score higher on standardized tests, they have parents with higher educational achievements. All of these things, as well as some others that may be very hard to measure, matter to future income. Thus, if we look at the income difference between people who go to college and people who don't, it isn't accurate to say college caused that income difference, when some of the difference is caused by other factors. In principle, we could experiment, by sending some people chosen at random to college and prohibiting others from going. Sensible governments prohibit such experiments with people's lives.

 Economists use statistical modelling -- following Simon -- to try to control for variables that might influence outcomes. While this has produced some significant successes (e.g. several distinct ways, one involving identical twins, of measuring the value of a college education come up with similar values), the approach is limited by both assumptions that must hold and can't be verified, and the possibility that variables that matter are not observed, like rain when trying to understand the effects of temperature on corn yields. Models are necessary to tease out causality, but such models are typically not known to be true.

Macroeconomic models (dealing with inflation, unemployment, GDP and other broad measures) use a technique called calibration, wherein a specific model is selected from a broad family of models by how well it fits the historical data, as a means of insuring that the theory is consistent with data. Microeconomic models tend to use an assumption that people act in their own self-interest as a key assumption. As a practical matter, this assumption is routinely violated as people are slow to adapt or adjust to changing circumstances, that is, they act in their own self-interest eventually, not immediately as usually assumed. Various hacks are used to accommodate this hysteresis present in the data but for the most part, the problem is ignored.

 Medicine suffers greatly from the problem of relevant, unobserved variables or third causes. People who drink a single glass of wine every night live longer than others. Is this a causal statement, that is, if you change your consumption and drink a single glass of wine every night, will you live longer? The problem is identical to the college education value: the kind of person who consumes a single glass of wine is likely different than the kind of person who, say, goes on a binge once a month or who doesn't drink at all. In particular, drinking a single glass is probably correlated with other things that extend life, like moderate food consumption, more sleep, and safer driving. Medical researchers have mostly quit even using the term 'cause' in favor of 'associated with' because they have been wrong so often in the past. Nonetheless, medicine remains something of a causality train wreck, due to third causes. An important thing to understand is that using more subjects doesn't help. If a researcher attributes the effects of a moderate lifestyle to the consumption of wine, it doesn't matter whether the researcher studed fifty people or fifty million. All fifty million does is estimate the wrong thing more precisely.

 In contrast to economics, psychology relies heavily on experiments. Psychology has a replicability problem that appears somewhat greater than that of economics, but psychology's problem is probably due to small sample sizes rather than poor experiments. Psychologists generally have developed good experimental design and take classes in it; they have a long history of eliminating experimenter bias, observer effects, etc. Probably all social sciences would benefit from raising the criteria for finding a meaningful effect to a higher level, following the physicists' requirement of five standard deviations. Importantly, though, the use of experiments as a foundation in psychology usually vitiates the need for the fancy causal inference tools developed by economists.

 Astronomy is arguably the closest parallel to economics, yet is significantly different. Astronomers also cannot use experiments (e.g. inserting a Jupiter-sized planet into an orbit as close to a star as Mercury is to the sun, a so-called hot Jupiter, to study solar system formation, or merging two black holes to assess gravitational waves, is beyond our ability). As a result, astronomers rely heavily on models that are consistent with terrestrial experiments (particles must behave as experiments say they do), and looking backward in time, which astronomers accomplish because light has a fixed speed and isn't fast by galactic, much less intergalactic, standards. That work relies heavily on relativity, which is acceptable because relativity has passed exacting terrestrial testing with extraordinary precision. In contrast, microeconomics relies heavily on theory that tends to perform poorly in experiments. Part of the reason the theory fails is that experiments are performed at low stakes; experiments themselves show that as the stakes are increased, people perform more closely to the theory. Nevertheless, the experimental foundation of economics is not so much lacking as false, or at least error-prone. Calibrated models are probably more reliable than experiment-grounded economic models, for the simple reason that deviations in the match between theory and experiment are mitigated by calibration.

 Climatologists use models in precisely the same way as astronomers -- we've only got the one planet to experiment on -- and have good models of the chemistry and physics involved. Like macroeconomists, climatologists tend to calibrate (choose the parameters of) their models to fit historical data as a means of selecting the model that best fits history. Climate models are grounded in replicable experiments, making them more like astronomy than economics.

 Physics and chemistry have mostly been immune to causality problems. For molecules, causality flows downstream. Physics and chemistry rest on repeated experiments that give the same answer with delightfully high precision. Of course, very small particles behave weirdly (to say the least), and a recent experiment suggests that objective measurement itself, on which the entire edifice of physics is founded, may not be reliable. I will be interested to see how this plays out.

Thanks, Preston - an informative read as always.

Like
Reply

To view or add a comment, sign in

More articles by R. Preston McAfee

  • Preston McAfee AMA

    Nicole Immorlica (Microsoft), Ramesh Johari (Stanford), Vince Conitzer (Duke) and audience members ask me questions…

  • Police Violence is Systemic

    Like many white Americans, I believed that police violence against African-Americans was 'a few bad apples.' I was…

    3 Comments
  • What is the Covid-19 Fatality Rate?

    The fatality rate, of course, is the number of deaths divided by the number of infections. There are two major types of…

    9 Comments
  • On the IHME Forecast Model

    There has been a great deal of controversy over the IHME forecast model and whether it is accurate or not. The model…

    4 Comments
  • Forecasting Covid Deaths

    There has been extensive discussion of when the sickness and deaths will peak, and how many will die from it. The…

    5 Comments
  • More Data-mining

    This article is an update to That article delivered implausibly large extrapolated values. For forecasting a couple of…

    8 Comments
  • Data-mining Death Rates: US and Italy

    Because Johns-Hopkins has been posting international and US data on confirmed cases and deaths on Github, it is easy to…

    2 Comments
  • Quantifying the path of Covid-19

    You see the "flattening" diagram a lot, and the argument that we should try to avoid overwhelming the hospitals using…

    23 Comments
  • Android Backgrounds via IFTTT and r/Earthporn

    Sometimes small things make a big difference. Automated backgrounds images are one of those things.

  • Angry Cars

    Starting around 2015, cars began looking meaner, angrier. Look at the 2010 Miata: It is as happy as a Disney sidekick.

    2 Comments

Others also viewed

Explore content categories