In Search of the Perfect Recession Indicator

The downturn in the energy sector and persistent economic weakness abroad has caused the investment community to become increasingly focused on the possibility of a U.S. recession.  In this piece, I’m going to examine a historically powerful indicator that would seem to rule out that possibility, at least for now.

The following chart (source: FRED) shows the seasonally-adjusted U.S. civilian unemployment rate (UE) from  January 1948 to January 2016:


As the chart illustrates, the unemployment rate is a lagging indicator of recession.  By the time high unemployment takes hold in an economy, a recession has usually already begun.

In contrast with the absolute level, the trend in the unemployment rate–the direction that the rate is moving in–is a coincident indicator of recession, and can sometimes even be a leading indicator.  As the table below shows, in each of the eleven recessions that occurred since 1948, the trend in the unemployment rate turned higher months before the recession began.  The average lead for the period was 3.45 months.


Admittedly, the phrase “turning higher” is ambiguous.  We need to be more precise, and so we’re going to define the phrase in terms of trailing moving averages.  That is, we’re going to say that the unemployment rate trend has turned higher whenever its current value crosses above the moving average of its trailing values over some period, and that the unemployment rate trend has turned lower whenever its current value falls below the average of its trailing values over some period.

In the following chart, we plot the unemployment rate alongside its trailing 12 month moving average from January 1948 to January 2016.  The red and green circles delineate important crossover points, with red crossovers delineating upward (bearish) turns, and green crossovers delineating downward (bullish) turns:


As you can see, historically, whenever the unemployment rate has crossed above the moving average, a recession has almost always followed shortly thereafter.  Similarly, for every recession that actually did occur in the period, the unemployment rate successfully foreshadowed the recession in advance by crossing above its moving average.

The following chart takes the indicator back farther, from April 1929 to April 1947:


In contrast with the earlier chart, the indicator here appears to be a bit late. After capturing the onset of the Great Depression almost perfectly, the indicator misses the onset of the 1937 and 1945 recessions by a few months.   It’s not alone in that respect–the 1937 and 1945 recessions were missed by pretty much every other recession indicator on the books.

The Fed is well aware of the recession forecasting power of the trend in the unemployment rate.  New York Fed president William Dudley discussed the matter explicitly in a speech just last month:

“Looking at the post-war period, whenever the unemployment rate has increased by more than 0.3 to 0.4 percentage points, the economy has always ended up in a full-blown recession with the unemployment rate rising by at least 1.9 percentage points. This is an outcome to avoid, especially given that in an economic downturn the last to be hired are often the first to be fired. The goal is the maximum sustainable level of employment—in other words, the most job opportunities for the most people over the long run.”

As far as the U.S. economy is concerned, the indicator’s current verdict is clear: no recession.  We may enter a recession later this year, or next year, but we’re not in a recession right now.

Individual energy-exposed regions of the country, however, are in recession, and the indicator is successfully flagging that fact.  The following chart shows the unemployment rate for Houston (source: FRED):


Per the indicator, Houston’s economy is solidly in recession.  We know the reason why: the plunge in oil prices.

Dallas is tilting in a similar direction.  But it’s a more diversified economy, with less exposure to oil and gas production, so the tilt isn’t as strong (source: FRED):


If the larger U.S. economy is not in a recession, what is the investing takeaway?  The takeaway is that we should be constructive on risk, with a bias towards being long equities, given the reduced odds of a large market drop.  Granted, recessions aren’t the only drivers of large market drops, but they’re one of the few drivers that give clear signs of their presence before the drops happen, so that investors can get out of the way. Where they can be ruled out, the risk-reward proposition of being long equities improves dramatically.

Now, the rest of this piece will be devoted to an rigorous analysis of the unemployment rate trend as a market timing indicator.  The analysis probably won’t make sense to those readers that haven’t yet read the prior piece on “Growth-Trend Timing”, so I would encourage them to stop here and go give it a skim.  What I say going forward will make more sense.

To begin, recall that GTT seeks to improve on the performance of a conventional trend-following market timing strategy by turning off the trend-following component of the strategy (i.e., going 100% long no matter what) during periods where the probability of recession is low.  In this way, GTT avoids substantial whipsaw losses, while incurring only a slightly increased downside risk.

Using the unemployment rate as an input, the specific trading rule for GTT would be:

(1) If the unemployment rate trend is downward, i.e., not indicating an oncoming recession, then go 100% long U.S. equities.

(2) If the unemployment rate trend is upward, indicating an oncoming recession, then defer to the price trend.  If the price trend is upward, then go 100% long U.S. equities.  If the price trend is downward, then go to cash.

To summarize, GTT will be 100% invested in the market unless the unemployment rate trend is upward at the same time that the price trend is downward.  Together, these indicators represent a double confirmation of danger that forces the strategy to take a safe position.

The following chart shows the strategy’s performance in U.S. equities from January 1930 to January 2016.  The unemployment rate trend is measured in terms of the position of the unemployment rate relative to its trailing 12 month moving average, where above signifies an upward trend, and below signifies a downward trend.  The price trend is measured in a similar way, based on the position of the market’s total return index relative to the trailing 10 month moving average of that index:


The blue line is the performance of the strategy, GTT.  The green line is the performance of a pure and simple moving average strategy, without GTT’s recession filter.  The dotted red line is the outperformance of GTT over the simple moving average strategy. The yellow line is a rolled portfolio of three month treasury bills. The gray line is buy and hold.  The black line is GTT’s “X/Y” portfolio–i.e., a portfolio with the same net equity and cash exposures as GTT, but achieved through a constant allocation over time, rather than through in-and-out timing moves (see the two prior pieces for a more complete definition).  The purple bars indicate periods where the unemployment rate trend is downward, ruling out recession.  During those periods, the moving average strategy embedded in GTT gets turned off, directing the strategy to take a long position no matter what.

As the chart illustrates, the strategy beats buy and hold (gray) as well as a simple moving average (green) strategy by over 150 basis points per year.  That’s enough to triple returns over the 87 year period, without losing any of the moving average strategy’s downside protection.

In the previous piece, we looked at the following six inputs to GTT:

  • Real Retail Sales Growth (yoy, RRSG)
  • Industrial Production Growth (yoy, IPG)
  • Real S&P 500 EPS Growth (yoy, TREPSG), modeled on a total return basis.
  • Employment Growth (yoy, JOBG)
  • Real Personal Income Growth (yoy, RPIG)
  • Housing Start Growth (yoy, HSG)

We can add a seventh input to the group: the unemployment rate trend (UE vs. 12 MMA). The following table shows GTT’s excess performance over a simple moving average strategy on each of the seven inputs, taken individually:


As the table shows, the unemployment rate trend beats all other inputs. To understand why it performs better, we need to more closely examine what GTT is trying to accomplish.

Recall that the large market downturns that drive the outperformance of trend-following strategies tend to happen in conjunction with recessions.  When a trend-following strategy makes a switch that is not associated with an ongoing or impending recession, it tends to incur whipsaw losses.  (Note: these losses were explained in thorough detail in the prior piece).

What GTT tries to do is use macroeconomic data to distinguish periods where a recession is likely from periods where a recession is unlikely.  In periods where a recession is unlikely, the strategy turns off its trend-following component, taking a long position in the market no matter what the price trend happens to be.  It’s then able to capture the large downturns that make trend-following strategies profitable, without incurring the frequent whipsaw losses that would otherwise detract from returns.

The ideal economic indicator to use in the strategy is one that fully covers the recessionary period, on both sides.  The following chart illustrates using the 2008 recession as an example:


We want the red area, where the recession signal is in and where the trend-following component is turned on, to fully cover the recessionary period, from both ends. If the signal comes in early, before the recession begins, or goes out late, after the recession has ended, the returns will not usually be negatively impacted.  The trend-following component of the strategy will take over during the period, and will ensure that the strategy profitably trades around the ensuing market moves.

What we categorically don’t want, however, is a situation where the red area fails to fully cover the recessionary period–in particular, a situation where the indicator is late to identify the recession.  If that happens, the strategy will not be able to exit the market on the declining trend, and will risk of getting caught in the ensuing market downturn.  The following chart illustrates the problem using the 1937 recession as an example:


As you can see, the indicator flags the recession many months after it has already begun. The trend-following component therefore doesn’t get turned on until almost halfway through the recessionary period.  The risk is that during the preceding period–labeled the “danger zone”–the market will end up suffering a large downturn.  The strategy will then be stuck in a long position, unable to respond to the downward trend and avoid the losses.  Unfortunately for the strategy, that’s exactly what happened in the 1937 case.  The market took a deep dive in the early months of the recession, before the indicator was flagging.  The strategy was therefore locked into a long position, and suffered a large drawdown that a simple unfiltered trend-following strategy would have largely avoided.

We can frame the point more precisely in terms of two concepts often employed in the area of medical statistics: sensitivity and specificity.  These concepts are poorly-named and very easy to confuse with each other, so I’m going to carefully define them.

The sensitivity and specificity of an indicator are defined as follows:

  • Sensitivity: the percentage of actual positives that the indicator identifies as positive.
  • Specificity: the percentage of actual negatives that the indicator identifies as negative.

To use an example, suppose that there are 100 recessionary months in a given data set.  In 86 of those months, a recessionary indicator comes back positive, correctly indicating the recession.  The indicator’s sensitivity to recession would then be 86 / 100 = 86%.

Alternatively, suppose that there are 700 non-recessionary months in a given data set.  In 400 of those non-recessionary months, a recessionary indicator comes back negative, correctly indicating no recession. The indicator’s specificity to recession would then be 400 / 700 = 57%.

More than anything else, what GTT needs is an indicator with a high sensitivity to recession–an indicator that rarely gives false negatives, and that will correctly indicate that a recession is happening whenever a recession is, in fact, happening.

Having a high specificity to recession, in contrast, isn’t as important to the strategy, because the strategy has the second layer of the price trend to protect it from unnecessary switches.   If the indicator sometimes overshoots with false positives, indicating a recession when there is none, the strategy won’t necessarily suffer, because if there’s no recession, then the price trend will likely be healthy.  The healthy price trend will keep the strategy from incorrectly exiting the market on the indicator’s mistake.

Of all the indicators in the group, the unemployment rate trend delivers the strongest performance for GTT because it has the highest recession sensitivity.  If there’s a recession going on, it will almost always tell us–better than any other single recession indicator.  In situations where no recession is happening, it may give false positives, but that’s not a problem, because unless the false positives coincide with a downward trend in the market price–an unlikely coincidence–then the strategy will stay long, avoiding the implied whipsaw.

For comparison, the following tables show the sensitivity and specificity of the different indicators across different time periods:




As the tables confirm, the unemployment rate  has a very strong recession sensitivity, much stronger than any other indicator.  That’s why it produces the strongest performance.

Now, we can still get good results from indicators that have weaker sensitivities.  We just have to aggregate them together, treating a positive indication from any of them as a positive indicatation for the aggregate signal.  That’s what we did in the previous piece. We put real retail sales growth and industrial production growth together, housing start growth and real personal income growth together, and so on, increasing the sensitivity of the aggregate signal at the expense of its specificity.

Right now, only two of the seven indicators are flagging recession: industrial production growth and total return EPS growth.  We know why those indicators are flagging recession–they’re getting a close-up view of the slowdown in the domestic energy sector and the unrelated slowdown in the larger global economy.  Will they prove to be right?  In my view, no.  Energy production is a small part of the US economy, even when multipliers are considered.  Similarly, the US economy has a relatively low exposure to the global economy, even though a significant portion of the companies in the S&P 500 are levered to it.

Even if we decide to go with industrial production growth (or one of its ISM siblings) as the preferred indicator, recent trends in that indicator are making the recession call look shakier.  In the most recent data point, the indicator’s growth rate has turned up, which is not what we would expect to be seeing right now if the indicator were right and the other indicators were wrong:


Now, the fact that a U.S. recession is unlikely doesn’t mean that the market is any kind of buying opportunity.  Valuation can hold a market back on the upside, and the market’s current valuation is quite unattractive.  At a price of 1917, the S&P 500’s trailing operating P/E ratio is 18.7.  Its trailing GAAP P/E ratio is 21.5.  Those numbers are being achieved on peaking profit margins–leaving two faultlines for the market to crack on, rather than just one.  Using non-cyclical valuation measures, which reflect both of those vulnerabilities, the numbers get worse.

My view is that as time passes, the market will continue to acclimatize to the two issues that it’s been most worried about over the last year: (1) economic weakness and potential instability in China and (2) the credit implications of the energy downturn.  A similar acclimatization happened with the Euro crisis.  It always seems to happen with these types of issues.  The process works like this.  New “problems” emerge, catching investors off-guard.  Many investors come to believe that this is it, the start of the “big” move lower. The market undergoes a series of gyrations as it wrestles with the problems. Eventually, market participants get used to them, accustomed to their presence, like a swimmer might get accustomed to cold water.  The sensitivity, fear and reactivity gradually dissipate. Unless the problems continue to deteriorate, investors gravitate back into the market, even as the problems are left “unsolved.”

Right now, there’s a consensus that an eventual devaluation of the yuan, with its attendant macroeconomic implications, is itself a “really bad thing”, or at least a consequence of a “really bad thing” that, if it should come to pass, will produce a large selloff in U.S. equities.  But there’s nothing privileged or compelling about that consensus, no reason why it should be expected to remain “the” consensus over time.  If we keep worrying about devaluations, and we don’t get them, or we do get them, and nothing bad happens, we will eventually grow less concerned about the prospect, and will get pulled back into the market as it grinds higher without us.  In actuality, that seems to be what’s already happening.

Valuation-conscious investors that are skeptical of the market’s potential to deliver much in the way of long-term returns–and I would include myself in that category–do have other options.  As I discussed in a piece from last September, we can take advantage of elevated levels of volatility and sell puts or covered calls on a broad index such as the S&P 500 or the Russell 2000.  By foregoing an upside that we do not believe to be attractive to begin with, we can significantly pad our losses in a potential downturn, while earning a decent return if the market goes nowhere or up (the more likely scenario, in my view).

To check in on the specific trade that I proposed, on September 6th, 2015, with $SPY at 192.59, the bid on the 165 September 2016 $SPY put was 8.79.   Today, $SPY is at essentially the same price, but the value of the put has decayed substantially.  The ask is now 4.92.  On a mark-to-market basis, an investor that put a $1,000,000 into the trade earned roughly $4 per share–$24,000, or roughly 6% annualized, 6% better than the market, which produced nothing.

For the covered call version of the trade, the bid on the 165 September 2016 call was 33.55.  As of Friday, the ask is now 30.36.  On a mark-to-market basis, then, the investor has earned roughly $3.20 in the trade.  The investor also pocketed two $SPY dividends, worth roughly $2.23.   In total, that’s $5.43 per share, or roughly 8% annualized.  If the market continues to churn around 1900, the investor will likely avoid assignment and get to stay in the trade, if not through both of the upcoming dividends, then at least through the one to be paid in March.

To summarize, right now, the data is telling us that a large, recessionary downturn is unlikely.  So we want to be long.  At the same time, the heightened state of valuations and the increasing age of the current cycle suggest that strong returns from here are unlikely.   In that kind of environment, it’s attractive to sell downside volatility.  Of course, in selling downside volatility, we lose the ability to capitalize on short-term trading opportunities. Instead of selling puts last September, for example, we could have bought the market, sold it at the end of the year at the highs, and then bought it back now, ready to repeat again. But that’s a difficult game to play, and an even more difficult game to win at.  For most of us, a better approach is to identify the levels that we want to own the market at, and get paid to wait for them.

While we’re on the topic of GTT and recession-timing, I want to address a concern that a number of readers have expressed about GTT’s backtests.  That concern pertains to the impact of data revisions.  GTT may work well with the revised macroeconomic data contained in FRED, but real-time investors don’t have access to that data–all they have access to is unrevised data.  But does the strategy work on unrevised data?

Fortunately, it’s possible (though cumbersome) to access unrevised data through FRED. Starting with the unemployment rate, the following chart shows the last-issue revised unemployment rate alongside the first-issue unrevised unemployment rate from March 1961 to present:


As you can see, there’s essentially no difference between the two rates.  They overlap almost perfectly, confirming that the revisions are insignificant.

Generally, in GTT, the impact of revisions is reduced by the fact that the strategy pivots off of trends and year-over-year growth rates, rather than absolute levels and monthly growth rates, where small changes would tend to have a larger effect.

The following chart shows the performance of GTT using from March 1961 to present using first-issue unrevised unemployment rate data (orange) and last-issue revised unemployment rate data (blue).  Note that unrevised data prior to March 1961 is not available, which is why I’ve chosen that date as the starting point:


Interestingly, in the given data set, the strategy actually works better on unrevised data. Of course, that’s likely to be a random occurrence driven by luck, as there’s no reason for unrevised data to produce a superior performance.

The following chart shows the performance of GTT using available unrevised and revised data for industrial production growth back to 1928:

IPG rvur

In this case, the strategy does better under the revised data, even though both versions outperform the market and a simple moving average strategy.  The difference in performance is worth about 40 basis points annually, which is admittedly significant.

One driver of the difference between the unrevised and revised performance for the industrial production case is the fact that the unrevised data produced a big miss in late 2011, wrongly going negative and indicating recession when the economy was fine.  Recently, a number of bearish commentators have cited the accuracy of the industrial production growth indicator as a reason for caution, pointing out that the indicator that has never produced sustained negative year over year growth outside of recession.  That may be true for revised data, but it isn’t true for unrevised data, which is all we have to go on right now.  Industrial production growth wrongly called a recession in 2011, only to get revised upwards several months later.

The following chart shows the performance of GTT using available unrevised and revised data for real retail sales growth:

RRSG rvur

The unrevised version underperforms by roughly 20 basis points annually.

The following chart shows the performance of GTT using available unrevised and revised data for job growth:

JOBG rvur

For job growth, the two versions perform about the same.

Combining indicators shrinks the impact of inaccuracies, and reduces the difference between the unrevised and revised cases.  The following chart illustrates, combining industrial production and job growth into a single “1 out of 2” indicator:


Unfortunately, unrevised data is unavailable for EPS (the revisions would address changes to SEC 10-Ks and 10-Qs), real personal income growth, and housing start growth.  But the tests should provide enough evidence to allay the concerns.  The first-issue data, though likely to be revised in small ways, captures the gist of what is happening in the economy, and can be trusted in market timing models.

In a future piece, I’m going to examine GTT’s performance in local currency foreign equities.  GTT easily passes out-of-sample testing in credit securities, different sectors and industries, different index constructions (where, for example, the checking days of the month are chosen randomly), and individual securities (which simple unfiltered trend-following strategies do not work in).  However, the results in foreign securities are mixed.

If we use U.S. economic data as a filter to time foreign securities, the performance turns out to be excellent.  But if we use economic data from the foreign countries themselves, then the strategy ends up underperforming a simple unfiltered trend-following strategy.  Among other things, this tells us something that we could probably have already deduced from observation: the health of our economy and our equity markets is more relevant to the performance of foreign equity markets than the health of their own economies.  This is especially true with respect to large downward moves–the well-known global “crises” that drag all markets down in unison, and that make trend-following a historically profitable strategy.

Posted in Uncategorized | Comments Off on In Search of the Perfect Recession Indicator

Growth and Trend: A Simple, Powerful Technique for Timing the Stock Market

Suppose that you had the magical ability to foresee turns in the business cycle before they happened.  As an investor, what would you do with that ability?  Presumably, you would use it to time the stock market.  You would sell equities in advance of recessions, and buy them back in advance of recoveries.

The following chart shows the hypothetical historical performance of an investment strategy that times the market on perfect knowledge of future recession dates.  The strategy, called “Perfect Recession Timing”, switches from equities (the S&P 500) into cash (treasury bills) exactly one month before each recession begins, and from cash back into equities exactly one month before each recession ends (first chart: linear scale; second chart: logarithmic scale):


As you can see, Perfect Recession Timing strongly outperforms the market.  It generates a total return of 12.9% per year, 170 bps higher than the market’s 11.2%.  It experiences annualized volatility of 12.8%, 170 bps less than the market’s 14.5%.  It suffers a maximum drawdown of -27.2%, roughly half of the market’s -51.0%.

In this piece, I’m going to introduce a market timing strategy that will seek to match the performance of Perfect Recession Timing, without relying on knowledge of future recession dates.  That strategy, which I’m going to call “Growth-Trend Timing”, works by adding a growth filter to the well-known trend-following strategies tested in the prior piece.  The chart below shows the performance of Growth-Trend Timing in U.S. equities (blue line) alongside the performance of Perfect Recession Timing (red line):


The dotted blue line is Growth-Trend Timing’s outperformance relative to a strategy that buys and holds the market.  In the places where the line ratchets higher, the strategy is exiting the market and re-entering at lower prices, locking in outperformance.  Notice that the line ratchets higher in almost perfect synchrony with the dotted red line, the outperformance of Perfect Recession Timing.  That’s exactly the intent–for Growth-Trend Timing to successfully do what Perfect Recession Timing does, using information that is fully entirely available to investors in the present moment, as opposed to information that will only be available to them in the future, in hindsight.

The piece will consist of three parts:  

  • In the first part, I’m going to construct a series of random models of security prices. I’m going to use the models to rigorously articulate the geometric concepts that determine the performance of trend-following market timing strategies.  In understanding the concepts in this section, we will understand what trend-following strategies have to do in order to be successful.  We will then be able to devise specific strategies to optimize their performance.
  • In the second part, I’m going to use insights from the first part to explain why trend-following market timing strategies perform well on aggregate indices (e.g., S&P 500, FTSE, Nikkei, etc.), but not on individual stocks (e.g., Disney, BP, Toyota, etc.). Recall that we encountered this puzzling result in the prior piece, and left it unresolved.
  • In the third part, I’m going to use insights gained from both the first and second parts to build the new strategy: Growth-Trend Timing.  I’m then going to do some simple out-of-sample tests on the new strategy, to illustrate the potential.  More rigorous testing will follow in a subsequent piece.

Before I begin, I’m going to make an important clarification on the topic of “momentum.”

Momentum: Two Versions

The market timing strategies that we analyzed in the prior piece (please read it if you haven’t already) are often described as strategies that profit off of the phenomenon of “momentum.”  To avoid confusion, we need to distinguish between two different empirical observations related to that phenomenon:

  • The first is the observation that the trailing annual returns of a security predict its likely returns in the next month.  High trailing annual returns suggest high returns in the next month, low trailing annual returns suggest low returns in the next month. This phenomenon underlies the power of the Fama-French-Asness momentum factor, which sorts the market each month on the basis of prior annual returns.
  • The second is the observation that when a security exhibits a negative price trend, the security is more likely to suffer a substantial drawdown over the coming periods than when it exhibits a positive trend.  Here, a negative trend is defined as a negative trailing return on some horizon (i.e., negative momentum), or a price that’s below a trailing moving average of some specified period.  In less refined terms, the observation holds that large losses–“crashes”–are more likely to occur after an aggregate index’s price trend has already turned downward.

red flagThese two observations are related to each other, but they are not identical, and we should not refer to them interchangeably.  Unlike the first observation, the second observation does not claim that the degree of negativity in the trend predicts anything about the future return.  It doesn’t say, for example, that high degrees of negativity in the trend imply high degrees of negativity in the subsequent return, or that high degrees of negativity in the trend increase the probability of subsequent negativity.  It simply notes that negativity in the trend–of any degree–is a red flag that substantially increases the likelihood of a large subsequent downward move.

Though the second observation is analytically sloppier than the first, it’s more useful to a market timer.  The ultimate goal of market timing is to produce equity-like returns with bond-like volatility.  In practice, the only way to do that is to sidestep the large drawdowns that equities periodically produce.  We cannot reliably sidestep drawdowns unless we know when they are likely to occur.  When are they likely to occur?  The second observation gives the answer: after a negative trend has emerged.  So if you see a negative trend, get out.

The backtests conducted in the prior piece demonstrated that strategies that exit risk assets upon signs of a negative trend tend to outperform buy and hold strategies.  Their successful avoidance of large drawdowns more than makes up for the relative losses that they incur by switching into lower-return assets.  What we saw in the backtest, however, is that this result only holds for aggregate indices.  When the strategies are used to time individual securities, the opposite result is observed–the strategies strongly underperform buy and hold, to an extent that far exceeds the level of underperformance that random timing with the same exposures would be expected to produce.

How can an approach work on aggregate indices, and then not work on the individual securities that make up those indices?  That was the puzzle that we left unsolved in the prior piece, a puzzle that we’re going to try to solve in the current piece.  The analysis will be tedious in certain places, but well worth the effort in terms of the quality of market understanding that we’re going to gain as a result.

Simple Market Timing: Stop Loss Strategy

In this section, I’m going to use the example of a stop loss strategy to illustrate the concept of “gap losses”, which are typically the main sources of loss for a market timing strategy.  

To begin, consider the following chart, which shows the price index of a hypothetical security that oscillates as a sine wave.


(Note: The prices in the above index, and all prices in this piece, are quoted and charted on a total return basis, with the accrual of dividends and interest payments already incorporated into the prices.)

The precise equation for the price index, quoted as a function of time t, is:

(1) Index(t) = Base + Amplitude * ( Sin (2 * π / Period * t ) )

The base, which specifies the midpoint of the index’s vertical oscillations, is set to 50.  The amplitude, which specifies how far in each vertical direction the index oscillates, is set to 20.  The period, which specifies how long it takes for the index to complete a full oscillation, is set to 40 days.  Note that the period, 40 days, is also the distance between the peaks.

Now, I want to participate in the security’s upside, without exposing myself to its downside.  So I’m going to arbitrarily pick a “stop” price, and trade the security using the following “stop loss” rule:

(1) If the price of the security is greater than or equal to the stop, then buy or stay long.

(2) If the price of the security is less than the stop, then sell or stay out.

Notice that the rule is bidirectional–it forces me out of the security when the security is falling, but it also forces me back into the security when the security is rising.  In doing so, it not only protect me from the security’s downside below the stop, it also ensures that I participate in any upside above the stop that the security achieves.  That’s perfect–exactly what I want as a trader.

To simplify the analysis, we assume that we’re operating in a market that satisfies the following two conditions:

Zero Bid-Ask Spread: The difference between the highest bid and the lowest ask is always infinitesimally small, and therefore negligible.  Trading fees are also negligible.

Continuous Prices:  Every time a security’s price changes from value A to value B, it passes through all values in between A and B.  If traders already have orders in, or if they’re quick enough to place orders, they can execute trades at any of the in-between values. 

The following chart shows the performance of the strategy on the above assumptions.  For simplicity, we trade only one share.


The blue line is the value of the strategy. The orange line is the stop, which I’ve arbitrarily set at a price of 48.5.  The dotted green line is the strategy’s outperformance relative to a strategy that simply buys and holds the security.  The outperformance is measured against the right y-axis.

As you can see, when the price rises above the stop, the strategy buys in at the stop, 48.5. For as long as the price remains above that level, the strategy stays invested in the security, with a value equal to the security’s price.  When the price falls below the stop, the strategy sells out at the stop, 48.5.  For as long as the price remains below that level, the strategy stays out of it, with a value steady at 48.5, the sale price.

Now, you’re probably asking yourself, “what’s the point of this stupid strategy?” Well, let’s suppose that the security eventually breaks out of its range and makes a sustained move higher.  Let’s suppose that it does something like this:


How will the strategy perform?  The answer: as the price rises above the stop, the strategy will go long the security and stay long, capturing all of the security’s subsequent upside. We show that result below:


Now, let’s suppose that the opposite happens.  Instead of breaking out and growing exponentially, the security breaks down and decays to zero, like this:


How will the strategy perform?  The answer: as the price falls below the stop, the strategy will sell out of the security and stay out of it, avoiding all of the security’s downside below the stop.


We can express the strategy’s performance in a simple equation:

(2) Strategy(t) = Max(Security Price(t), Stop)

Equation (2) tells us that the strategy’s value at any time equals the greater of either the security’s price at that time, or the stop.  Since we can place the stop wherever we want, we can use the stop loss strategy to determine, for ourselves, what our downside will be when we invest in the security.  Below the stop, we will lose nothing; above it, we will gain whatever the security gains.

Stepping back, we appear to have discovered something truly remarkable, a timing strategy that can allow us to participate in all of a security’s upside, without having to participate in any of its downside.  Can that be right?  Of course not.  Markets do not offer risk-free rewards, and therefore there must be a tradeoff somewhere that we’re missing, some way that the stop loss strategy exposes us to losses.  It turns out that there’s a significant tradeoff in the strategy, a mechanism through which the strategy can cause us to suffer large losses over time.  We can’t see that mechanism because it’s being obscured by our assumption of “continuous” prices.

Ultimately, there’s no such thing as a “continuous” market, a market where every price change necessarily entails a movement through all in-between prices.  Price changes frequently involve gaps–discontinuous jumps or drops from one price to another.  Those gaps impose losses on the strategy–called “gap losses.”

To give an example, if new information is introduced to suggest that a stock priced at 50 will soon go bankrupt, the bid on the stock is not going to pass through 49.99… 49.98… 49.97 and so on, giving each trader an opportunity to sell at those prices if she wants to. Instead, the bid is going to instantaneously drop to whatever level the stock finds its first interested buyer at, which may be 49.99, or 20.37, or 50 cents, or even zero (total illiquidity).  Importantly, if the price instantaneously drops to a level below the stop, the strategy isn’t going to be able to sell exactly at the stop.  The best it will be able to do is sell at the first price that the security gaps down to.  In the process, it will incur a “gap loss”–a loss equal to the “gap” between that price and the stop.

The worst-case gap losses inflicted on a market timing strategy are influenced, in part, by the period of time between the checks that it makes.  The strategy has to periodically check on the price, to see if the price is above or below the stop.  If the period of time between each check is long, then valid trades will end up taking place later in time, after prices have moved farther away from the stop.  The result will be larger gap losses.

Given the importance of the period between the checks, we might think that a solution to the problem of gap losses would be to have the strategy check prices continuously, at all times. But even on continuous checking, gap losses would still occur.  There are two reasons why. First, there’s a built-in discontinuity between the market’s daily close and subsequent re-opening.  No strategy can escape from that discontinuity, and therefore no strategy can avoid the gap losses that it imposes.  Second, discontinuous moves can occur in intraday trading–for example, when new information is instantaneously introduced into the market, or when large buyers and sellers commence execution of pre-planned trading schemes, spontaneously removing or inserting large bids and asks.

In the example above, the strategy checks the price of the security at the completion of each full day (measured at the close).  The problem, however, is that the stop–48.5–is not a value that the index ever lands on at the completion of a full day.  Recall the specific equation for the index:

(3) Index(t) = 50 + 20 * Sin ( 2 * π / 40 * t)

Per the equation, the closest value above 48.5 that the index lands on at the completion of a full day is 50.0, which it reaches on days 0, 20, 40, 60, 80, 100, 120, and so on.  The closest value below 48.5 that it lands on is 46.9, which it reach on days 21, 39, 61, 79, 101, 119, and so on.

It follows that whenever the index price rises above the stop of 48.5, the strategy sees the event when the price is already at 50.0.  So it buys into the security at the available price: 50.0.  Whenever the index falls below the stop of 48.5, the strategy sees the event when the price is already at 46.9.  So it sells out of the security at the available price: 46.9.  Every time the price interacts with the stop, then, a buy-high-sell-low routine ensues. The strategy buys at 50.0, sells at 46.9, buys again at 50.0, sells again at 46.9, and so on, losing the difference, roughly 3 points, on each “round-trip”–each combination of a sell followed by a buy.  That difference, the gap loss, represents the primary source of downside for the strategy.

Returning to the charts, the following chart illustrates the performance of the stop loss strategy when the false assumption of price continuity is abandoned and when gap losses are appropriately reflected:


The dotted bright green line is the buy price.  The green shaded circles are the actual buys. The dotted red line is the sell price.  The red shaded circles are the actual sells.  As you can see, with each round-trip transaction, the strategy incurs a loss relative to a buy and hold strategy equal to the gap: roughly 3 points, or 6%.

The following chart makes the phenomenon more clear.  We notice the accumulation of gap losses over time by looking at the strategy’s peaks.  In each cycle, the strategy’s peaks are reduced by an amount equal to the gap:


It’s important to clarify that the gap loss is not an absolute loss, but rather a loss relative to what the strategy would have produced in the “continuous” case, under the assumption of continuous prices and continuous checking.  Since the stop loss strategy would have produced a zero return in the “continuous” case–selling at 48.5, buying back at 48.5, selling at 48.5, buying back at 48.5, and so on–the actual return, with gap losses included, ends up being negative.

As the price interacts more frequently with the stop, more transactions occur, and therefore the strategy’s cumulative gap losses increase.  We might therefore think that it would be good for the strategy if the price were to interact with the stop as infrequently as possible.  While there’s a sense in which that’s true, the moments where the price interacts with the stop are the very moments where the strategy fulfills its purpose–to protect us from the security’s downside.  If we didn’t expect the price to ever interact with the stop, or if we expected interactions to occur only very rarely, we wouldn’t have a reason to bother implementing the strategy.

We arrive, then, at the strategy’s fundamental tradeoff.  In exchange for attempts to protect investors from a security’s downside, the strategy takes on a different kind of downside–the downside of gap losses.  When those losses fail to offset the gains that the strategy generates elsewhere, the strategy produces a negative return.  In the extreme, the strategy can whittle away an investor’s capital down to almost nothing, just as buying and holding a security might do in a worst case loss scenario.

In situations where the protection from downside proves to be unnecessary–for example, because the downside is small and self-reversing–the strategy will perform poorly relative to buy and hold.  We see that in the following chart:


In exchange for protection from downside below 48.5–downside that proved to be minor and self-reversing–the strategy incurred 12 gap losses.  Those losses reduced the strategy’s total return by more than half and saddled it with a maximum drawdown that ended up exceeding the maximum drawdown of buy and hold.

Sometimes, however, protection from downside can prove to be valuable–specifically, when the downside is large and not self-reversing.  In such situations, the strategy will perform well relative to buy and hold.  We see that in the following chart:


As before, in exchange for protection from downside in the security, the strategy engaged in a substantial number of unnecessary exits and subsequent reentries.  But one of those exits, the final one, proved to have been well worth the cumulative cost of the gap losses, because it protected us from a large downward move that did not subsequently reverse itself, and that instead took the stock to zero.

To correctly account for the impact of gap losses, we can re-write equation (2) as follows:

(4) Strategy(t) = Max(Security Price(t), Stop) – Cumulative Gap Losses(t)

What equation (4) is saying is that the strategy’s value at any time equals the greater of either the security’s price or the stop, minus the cumulative gap losses incurred up to that time.  Those losses can be re-written as the total number of round-trip transactions up to that time multiplied by the average gap loss per round-trip transaction.  The equation then becomes:

(5) Strategy(t) = Max(Security Price(t), Stop) – # of Round-Trip Transactions(t) * Average Gap Loss Per Round-Trip Transaction.

The equation is not exactly correct, but it expresses the concept correctly.  The strategy is exposed to the stock’s upside, it’s protected from the majority of the stock’s downside below the stop, and it pays for that protection by incurring gap losses on each transaction, losses which subtract from the overall return.

Now, the other assumption we made–that the difference between the bid and the ask was infinitesimally small–is also technically incorrect.  There’s a non-zero spread between the bid and the ask, and each time the strategy completes a round-trip market transaction, it incurs that spread as a loss.  Adding the associated cost to the equation, we get a more complete equation for a bidirectional stop loss strategy:

(6) Strategy(t) = Max(Security Price(t), Stop) – # of Round-Trip Transactions(t) * (Average Gap Loss Per Round-Trip Transaction + Average Bid-Ask Spread).

Again, not exactly correct, but very close.

The cumulative cost of traversing the bid-ask spread can be quite significant, particularly when the strategy checks the price frequently (i.e., daily) and engages in a large number of resultant transactions.  But, in general, the cumulative cost is not as impactful as the cumulative cost of gap losses.  And so even if bid-ask spreads could be tightened to a point of irrelevancy, as appears to have happened in the modern era of sophisticated market making, a stop loss strategy that engaged in frequent, unnecessary trades would still perform poorly

To summarize:

  • A bidirectional stop loss strategy allows an investor to participate in a security’s upside without having to participate in the security’s downside below a certain arbitrarily defined level–the stop.
  • Because market prices are discontinuous, the transactions in a stop loss strategy inflict gap losses, which are losses relative to the return that the strategy would have produced under the assumption of perfectly continuous prices.  Gap losses represent the primary source of downside for a stop loss strategy.
  • Losses associated with traversing the bid-ask spread are also incurred on each round-trip transaction.  In most cases, their impacts on performance are not as pronounced as the impacts of gap losses.

A Trailing Stop Loss Strategy: Otherwise Known As…

In this section, I’m going to introduce the concept of a trailing stop loss strategy.  I’m going to show how the different trend-following market timing strategies that we examined in the prior piece are just different ways of implementing that concept.  

The stop loss strategy that we introduced in the previous section is able to protect us from downside, but it isn’t able to generate sustained profit.  The best that it can hope to do is sell at the same price that it buys in at, circumventing losses, but never actually achieving any durable gains.


The circumvention of losses improves the risk-reward proposition of investing in the security, and is therefore a valid contribution.  But we want more.  We want total return outperformance over the index.

For a stop loss strategy to give us that, the stop cannot stay still, stuck at the same price at all times.  Rather, it needs to be able to move with the price.  If the price is above the stop and rises, the stop needs to be able to rise as well, so that any subsequent sale occurs at higher prices.  If the price is below the stop and falls, the stop needs to be able to fall as well, so that any subsequent purchase occurs at lower prices.  If the stop is able to move in this way, trailing behind the price, the strategy will lock in any profits associated with favorable price movements.  It will sell high and buy back low, converting the security’s oscillations into relative gains on the index.

The easiest way to implement a trailing stop loss strategy is to set the stop each day to a value equal to yesterday’s closing price.  So, if yesterday’s closing price was 50, we set the stop for today–in both directions–to be 50.  If the index is greater than or equal to 50 at the close today, we buy in or stay long.  If the index is less than 50, we sell or stay out.  We do the same tomorrow and every day thereafter, setting the stop equal to whatever the closing price was for the prior day.  The following chart shows what our performance becomes:


Bingo!  The performance ends up being fantastic.  In each cycle, the price falls below the trailing stop near the index peak, around 70, triggering a sell.  The price rises above the trailing stop near the index trough, around 30, triggering a buy.  As the sine wave moves through its oscillations, the strategy sells at 70, buys at 30, sells at 70, buys at 30, over and over again, ratcheting up a 133% gain on each completed cycle.  After 100 days, the value of the strategy ends up growing to almost 7 times the value of the index, which goes nowhere.

In the following chart, we increase the the trailing period to 7 days, setting the stop each day to the security’s price seven days ago:


The performance ends up being good, but not as good.  The stop lags the index by a greater amount, and therefore the index ends up falling by a greater amount on its down leg before moving down through the stop and triggering a sell.  Similarly, the index ends up rising by a greater amount on its up leg before moving up through the stop and triggering a buy.  The strategy isn’t able to sell as high or buy back as low as in the 1 day case, but it still does well.

The following is a general rule for a trailing stop loss strategy:

  • If the trailing period between the stop and the index is increased, the stop will end up lagging the index by a greater amount, capturing a smaller portion of the up leg and down leg of the index’s oscillation, and generating less outperformance over the index.
  • If the trailing period between the stop and the index is reduced, the stop will end up hugging the index more closely, capturing a greater portion of the up leg and down leg of the index’s oscillation, and generating more outperformance over the index.

Given this rule, we might think that the way to optimize the strategy is to always use the shortest trailing period possible–one day, one minute, one second, however short we can get it, so that the stop hugs the index to the maximum extent possible, capturing as much of the index’s upward and downward “turns” as it can.  This, of course, is true for an idealized price index that moves as a perfect, squeaky clean sine wave.  But as we will later see, using a short trailing period to time a real price index–one that contains the messiness of random short-term volatility–will increase the number of unnecessary interactions between the index and the stop, and therefore introduce new gap losses that will tend to offset the timing benefits.

Now, let’s change the trailing period to 20 days.  The following chart shows the performance:


The index and the stop end up being a perfect 180 degrees out of phase with each other, with the index crossing the stop every 20 days at a price of 50.  We might therefore think that the strategy will generate a zero return–buying at 50, selling at 50, buying at 50, selling at 50, buying at 50, selling at 50, and so on ad infinitum.  But what are we forgetting?  Gap losses.  As in the original stop loss case, they will pull the strategy into a negative return.

The trading rule that defines the strategy has the strategy buy in or stay long when the price is greater than or equal to the trailing stop, and sell out or stay in cash when the price is less than the trailing stop.  On the up leg, the price and the stop cross at 50, triggering a buy at 50.  On the down leg, however, the first level where the strategy realizes that the price is less than the stop, 50, is not 50.  Nor is it 49.99, or some number close by.  It’s 46.9.  The strategy therefore sells at 46.9.  On each oscillation, it repeats: buying at 50, selling at 46.9, buying at 50, selling at 46.9, accumulating a loss equal to the gap on each completed round-trip.  That’s why you see the strategy’s value (blue line) fall over time, even though the index (black line) and the stop (orange line) cross each other at the exact same point (50) in every cycle.

Now, to be clear, the same magnitude of gap losses were present earlier, when we set the stop’s trail at 1 day and 7 days.  The difference is that we couldn’t see them, because they were offset by the large gains that the strategy was generating through its trading.  On a 20 day trail, there is zero gain from the strategy’s trading–the index and the stop cross at the same value every time, 50–and so the gap losses show up clearly as net losses for the strategy.  Always remember: gap losses are not absolute losses, but losses relative to what a strategy would have produced on the assumption of continuous prices and continuous checking.

Now, ask yourself: what’s another name for the trailing stop loss strategy that we’ve introduced here?  The answer: a momentum strategy.  The precise timing rule is:

(1) If the price is greater than or equal to the price N days ago, then buy or stay long.

(2) If the price is less than the price N days ago, then sell or stay out.

This rule is functionally identical to the timing rule of a moving average strategy, which uses averaging to smooth out single-point noise in the stop:

(1) If the price is greater than or equal to the average of the last N day’s prices, then buy or stay long.

(2) If the price is less than the average of the last N day’s prices, then sell or stay out.

The takeaway, then, is that the momentum and moving average strategies that we examined in the prior piece are nothing more than specific ways of implementing the concept of a trailing stop loss.  Everything that we’ve just learned about that concept extends directly to their operations.

Now, to simplify, we’re going to ignore the momentum strategy from here forward, and focus strictly on the moving average strategy.  We will analyze the difference between the two strategies–which is insignificant–at a later point in the piece.

To summarize:

  • We can use a stop loss strategy to extract investment outperformance from an index’s oscillations by setting the stop to trail the index.
  • When the stop of a trailing stop loss strategy is set to trail very closely behind the index, the strategy will capture a greater portion of the upward and downward moves of the index’s oscillations.  All else equal, the result will be larger trading gains.  But all else is not always equal.  The larger trading gains will come at a significant cost, which we have not yet described in detail, but will discuss shortly.
  • The momentum and moving average strategies that we examined in the prior piece are nothing more than specific ways of implementing the concept of a trailing stop loss. Everything that we’ve learned about that concept extends directly over to their operations.

Determinants of Performance: Captured Downturns and Whipsaws

In this section, I’m going to explain how the moving average interacts with the price to produce two types of trades for the strategy: a profitable type, called a “captured downturn”, and an unprofitable type, called a “whipsaw.”  I’m going to introduce a precise rule that we can use to determine whether a given oscillation will lead to a captured downturn or a whipsaw.  

Up to now, we’ve been modeling the price of a security as a single sine wave with no vertical trend.  That’s obviously a limited simplification.  To take the analysis further, we need a more accurate model.

To build such a model, we start with a security’s primary fundamental: its potential payout stream, which, for an equity security, is its earnings stream.  Because we’re working on a total return basis, we assume that the entirety of the stream is retained internally. The result is a stream that grows exponentially over time.  We set the stream to start at 100, and to grow at 6% per year:


To translate the earnings into a price, we apply a valuation measure: a price-to-earnings (P/E) ratio, which we derive from an earnings yield, the inverse of a P/E ratio.  To model cyclicality in the price, we set the earnings yield to oscillate in sinusoidal form with a base or mean of 6.25% (inverse: P/E of 16), and a maximum cyclical deviation of 25% in each direction.   We set the period of the oscillation to be 7 years, mimicking a typical distance between business cycle peaks.  Prices are quoted on a monthly basis, as of the close:


The product of the security’s earnings and price-to-earnings ratio is just the security’s price index.  That index is charted below:


Admittedly, the model is not a fully accurate approximation of real security prices, but it’s adequate to illustrate the concepts that I’m now going to try to illustrate.  The presentation may at times seem overdone, in terms of emphasizing the obvious, but the targeted insights are absolutely crucial to understanding the functionality of the strategy, so the emphasis is justified.

In the chart below, we show the above price index with a 10 month moving average line trailing behind it in orange, and a 60 month moving average line trailing behind it in purple:


We notice two things.  First, the 10 month moving average line trails closer to the price than the 60 month moving average line.  Second, the 10 month moving average line responds to price changes more quickly than the 60 month moving average line.

To understand why the 10 month moving average trails closer and responds to changes more quickly than the 60 month moving average, all we need to do is consider what happens in a moving average over time.  As each month passes, the last monthly price in the average falls out, and a new monthly price, equal to the price in the most recent month (denoted with a * below), is introduced in.

The following chart shows this process for the 10 month moving average:  boxes10

As each month passes, the price in box 10 (from 10 months ago) is thrown out.  All of the prices shift one box to the left, and the price in the * box goes into box 1.

The following chart shows the same process for the 60 month moving average.  Note that the illustration is abbreviated–we don’t show all 60 numbers, but abbreviate with the “…” insertion:


A key point to remember here is that the prices in the index trend higher over time.  More recent prices therefore tend to be higher in value than more distant prices.  The price from 10 months ago, for example, tends to be higher than the price from 11 months, 12 months, 13 months ago, …, and especially the price from 57 months, 58 months ago, 59 months ago, and so on.  Because the 60 month moving average has more of those older prices inside its “average” than the 10 month moving average, its value tends to trail (i.e., be less than) the current price by a greater amount.

The 60 month moving average also has a larger quantity of numbers inside its “average” than the 10 month moving average–60 versus 10.  For that reason, the net impact on the average of tossing a single old number out, and bringing a single new number in–an effect that occurs once each month–tends to be less for the 60 month moving average than for the 10 month moving average.  That’s why the 60 month moving average responds more slowly to changes in the price.  The changes are less impactful to its average, given the larger number of terms contained in that average.

These two observations represent two fundamental insights about the the relationship between the period (length) of a moving average and its behavior.  That relationship is summarized in the bullets and table below:

  • As the period (length) of a moving average is reduced, the moving average tends to trail closer to the price, and to respond faster to changes in the price.
  • As the period (length) of a moving average is increased, the moving average tends to trail farther away from the price, and to respond more slowly to changes in the price.

The following chart illustrates the insights for the 10 and 60 month cases:


With these two insights in hand, we’re now ready to analyze the strategy’s performance. The following chart shows the performance of the 10 month moving average strategy on our new price index.  The value of the strategy is shown in blue, and the outperformance over buy and hold is shown dotted in green (right y-axis):


The question we want to ask is: how does the strategy generate gains on the index? Importantly, it can only generate gains on the index when it’s out of the index–when it’s invested in the index, its return necessarily equals the index’s return.  Obviously, the only way to generate gains on the index while out of the index is to sell the index and buy back at lower prices.  That’s what the strategy tries to do.

The following chart shows the sales and buys, circled in red and green respectively:


As you can see, the strategy succeeds in its mission: it sells high and buys back low.  For a strategy to be able to do that, something very specific has to happen after the sales.  The price needs to move down below the sale price, and then, crucially, before it turns back up, it needs to spend enough time below that price to bring the moving average line down with it.  Then, when it turns back up and crosses the moving average line, it will cross at a lower point, causing the strategy to buy back in at a lower price than it sold at.  The following chart illustrates with annotations:


Now, in the above drawing, we’ve put the sells and buys exactly at the points where the price crosses over the moving average, which is to say that we’ve shown the trading outcome that would ensue if prices were perfectly continuous, and if our strategy were continuously checking them.  But prices are not perfectly continuous, and our strategy is only checking them on a monthly basis.  It follows that the sells and buys are not going to happen exactly at the crossover points–there will be gaps, which will create losses relative to the continuous case.  For a profit to occur on a round-trip trade, then, not only will the moving average need to get pulled down below the sale price, it will need to get pulled down by an amount that is large enough to offset the gap losses that will be incurred.

As we saw earlier, in the current case, the 10 month moving average responds relatively quickly to the changes in the price, so when the price falls below the sale price, the moving average comes down with it.  When the price subsequently turns up, the moving average is at a much lower point.  The subsequent crossover therefore occurs at a much lower point, a point low enough to offset inevitable gap losses and render the trade profitable.

Now, it’s not guaranteed that things will always happen in this way.  In particular, as we saw earlier, if we increase the moving average period, the moving average will respond more slowly to changes in the price.  To come down below the sale price, it will need the price to spend more time at lower values after the sale.  The price may well turn up before that happens.  If it does, then the strategy will not succeed in buying at a lower price.

To see the struggle play out in an example, let’s look more closely at the case where the 60 month moving average is used.  The following chart shows the performance:


As you can see, the strategy ends up underperforming.  There are two aspects to the underperformance.

  • First, because the moving average trails the price by such a large amount, the price ends up crossing the moving average on the down leg long after the peak is in, at prices that are actually very close to the upcoming trough.  Only a small portion of the downturn is therefore captured for potential profit.
  • Second, because of the long moving average period, which implies a slow response, the moving average does not come down sufficiently after the sales occur.  Therefore, when the price turns back up, the subsequent crossover does not occur at a price that is low enough to offset the gap losses incurred in the eventual trade that takes place.

On this second point, if you look closely, you will see that the moving average actually continues to rise after the sales.  The subsequent crossovers and buys, then, are triggered at higher prices, even before gap losses are taken into consideration.  The following chart illustrates with annotations:


The reason the moving average continues to rise is that it’s throwing out very low prices from five years ago (60 months), and replacing them with newer, higher prices from today. Even though the newer prices have fallen substantially from their recent peak, they are still much higher than the older prices that they are replacing.  So the moving average continues to drift upward.  When the price turns back up, it ends up crossing the moving average at a higher price than where the sale happened at, completing an unprofitable trade (sell high, buy back higher), even before the gap losses are added in.

In light of these observations, we can categorize the strategy’s trades into two types: captured downturns and whipsaws.

  • In a captured downturn, the price falls below the moving average, triggering a sale. The price then spends enough time at values below the sale price to pull the moving average down below the sale price.  When the price turns back up, it crosses the moving average at a price below the sale price, triggering a buy at that price.  Crucially, the implied profit in the trade exceeds the gap losses and any other costs incurred, to include the cost of traversing the bid-ask spread.  The result ends up being a net gain for the strategy relative to the index.  This gain comes in addition to the risk-reduction benefit of avoiding the drawdown itself.
  • In a whipsaw, the price falls below the moving average, triggering a sale.  It then turns back up above the moving average too quickly, without having spent enough time at lower prices to pull the moving average down sufficiently below the sale price.  A subsequent buy is then triggered at a price that is not low enough to offset gap losses (and other costs) incurred in the transaction.  The result ends up being a net loss for the strategy relative to the index.  It’s important to once again recognize, here, that the dominant component of the loss in a typical whipsaw is the gap.  In a perfectly continuous market, where gap losses did not exist, whipsaws would typically cause the strategy to get out and get back in at close to the same prices.

Using these two trade categories, we can write the following equation for the performance of the strategy:

(7) Strategy(t) = Index(t) + Cumulative Gains from Captured Downturns(t) – Cumulative Losses from Whipsaws(t)

What equation (7) is saying is that the returns of the strategy at any given time equal the value of the index (buy and hold) at that time, plus the sum total of all gains from captured downturns up to that time, minus the sum total of all losses from whipsaws up to that time.  Note that we’ve neglected potential interest earned while out of the security, which will slightly boost the strategy’s return.

Now, there’s a simple thumbrule that we can use to determine whether the strategy will produce a captured downturn or a whipsaw in response to a given oscillation.  We simply compare the period of the moving average to the period of the oscillation.  If the moving average period is substantially smaller than the oscillation period, the strategy will produce a captured downturn and a profit relative to the index, with the profit getting larger as the moving average period gets smaller.  If the moving average period is in the same general range as the oscillation period–or worse, greater than the oscillation period–then the strategy will produce a whipsaw and a loss relative to the index.

Here’s the rule in big letters (note: “<<” means significantly less than):


To test the rule in action, the following chart shows the strategy’s outperformance on the above price index using moving average periods of 70, 60, 50, 40, 30, 20, 10, 5, and 1 month(s):


As you can see, a moving average period of 1 month produces the greatest outperformance. As the moving average period is increased from 1, the outperformance is reduced.  As the moving average period is increased into the same general range as the period of the price oscillation, 84 months (7 years), the strategy begins to underperform.

The following chart shows what happens if we set the oscillation period of the index to equal the strategy’s moving average period (with both set to a value of 10 months):


The performance devolves into a cycle of repeating whipsaws, with ~20% losses on each iteration.  Shockingly, the strategy ends up finishing the period at a value less than 1% of the value of the index.  This result highlights the significant risk of using a trend-following strategy–large amounts of money can be lost in the whipsaws, especially as they compound on each other over time.

Recall that for each of the markets backtested in the prior piece, I presented tables showing the strategy’s entry dates (buy) and exit dates (sell).  The example for U.S. equities is shown below (February 1928 – November 2015):


We can use the tables to categorize each round-trip trade as a captured downturn or a whipsaw.  The simple rule is:

  • Green Box = Captured Downturn
  • Red Box = Whipsaw

An examination of the tables reveals that in equities, whipsaws tend to be more frequent than captured downturns, typically by a factor of 3 or 4.  But, on a per unit basis, the gains of captured downturns tend to be larger than the losses of whipsaws, by an amount sufficient to overcome the increased frequency of whipsaws, at least when the strategy is acting well.

Recall that in the daily momentum backtest, we imposed a relative large 0.6% slip (loss) on each round-trip transaction.  As we explained in a related piece on daily momentum, we used that slip in order to correctly model the average cost of traversing the bid-ask spread during the tested period, 1928 – 2015.  To use any other slip would be to allow the strategy to transact at prices that did not actually exist at the time, and we obviously can’t do that in good faith.

Now, if you look at the table, you will see that the average whipsaw loss across the period was roughly 6.0%.  Of that loss, 0.6% is obviously due to the cost of traversing bid-ask spread.  We can reasonably attribute the other 5.4% to gap losses.  So, using a conservative estimate of the cost of traversing the bid-ask spread, the cost of each gap loss ends up being roughly 9 times as large as the cost of traversing the bid-ask spread.  You can see, then, why we’ve been emphasizing the point that gap losses are the more important type of loss to focus on.

To finish off the section, let’s take a close-up look at an actual captured downturn and whipsaw from a famous period in U.S. market history–the Great Depression.  The following chart shows the 10 month moving average strategy’s performance in U.S. equities from February 1928 to December 1934, with the entry-exit table shown on the right:


The strategy sells out in April 1930, as the uglier phase of the downturn begins.  It buys back in August 1932, as the market screams higher off of the ultimate Great Depression low.  Notice how large the gap loss ends up being on the August 1932 purchase.  The index crosses the moving average at around 0.52 in the early part of the month (1.0 equals the price in February 1928), but the purchase on the close happens at 0.63, roughly 20% higher.  The gap loss was large because the market had effectively gone vertical at the time. Any amount of delay between the crossover and the subsequent purchase would have been costly, imposing a large gap loss.

The strategy exits again in February 1933 as the price tumbles below the moving average. That exit proves to be a huge mistake, as the market rockets back up above the moving average over the next two months.  Recall that March 1933 was the month in which FDR took office and began instituting aggressive measures to save the banking system (a bank holiday, gold confiscation, etc.).  After an initial scare, the market responded very positively.  As before, notice the large size of the gap loss on both the February sale and the March purchase.  If the strategy had somehow been able to sell and then buy back exactly at the points where the price theoretically crossed the moving average, there would hardly have been any loss at all for the strategy.  But the strategy can’t buy at those prices–the market is not continuous.

To summarize:

  • On shorter moving average periods, the moving average line trails the price index by a smaller distance, and responds more quickly to its movements.
  • On longer moving average periods, the moving average line trails the price index by a larger distance, and responds more slowly to its movements.
  • For the moving average strategy to generate a relative gain on the index, the price must fall below the moving average, triggering a sale.  The price must then spend enough time below the sale price to bring the moving average down with it, so that when the price subsequently turns back up, it crosses the moving average at a lower price than the sale price.  When that happens by an amount sufficient to offset gap losses (and other costs associated with the transaction), we say that a captured downturn has occurred.  Captured downturns are the strategy’s source of profit relative to the index.
  • When the price turns back up above the moving average too quickly after a sale, triggering a buy without having pulled the moving average line down by a sufficient amount to offset the gap losses (and other costs incurred), we call that a whipsaw. Whipsaws are the strategy’s source of loss relative to the index.
  • Captured downturns occur when the strategy’s moving average period is substantially less than the period of the oscillation that the strategy is attempting to time.  Whipsaws occur whenever that’s not the case.
  • The strategy’s net performance relative to the index is determined by the balance between the effects of captured downturns and the effects of whipsaws.

Tradeoffs: Changing the Moving Average Period and the Checking Period

In this section, I’m going to add an additional component to our “growing sine wave” model of prices, a component that will make the model into a genuinely accurate approximation of real security prices.  I’m then going to use the improved model to explain the tradeoffs associated with changing (1) the moving average period and (2) the checking period.  

In the previous section, we modeled security prices as a combination of growth and cyclicality: specifically, an earnings stream growing at 6% per year multiplied by a P/E ratio oscillating as an inverse sine wave with a period of 7 years.  The resultant structure, though useful to illustrate concepts, is an unrealistic proxy for real prices–too clean, too smooth, too easy for the strategy to profitable trade.

To make the model more realistic, we need to add short-term price movements to the security that are unrelated to its growth or long-term cyclicality.  There are a number of ways to do that, but right now, we’re going to do it by adding random short-term statistical deviations to both the growth rate and the inverse sine wave.  Given the associated intuition, we will refer to those deviations as “volatility”, even though the term “volatility” has a precise mathematical meaning that may not always accurately apply.


The resultant price index, shown below, ends up looking much more like the price index of an actual security.  Note that the index was randomly generated:


Now would probably a good time to briefly illustrate the reason why we prefer the moving average strategy to the momentum strategy, even though the performances of the two strategies are statistically indistinguishable.  In the following chart, we show the index trailed by a 15 month momentum line (hot purple) and a 30 month moving average line (orange).  The periods have been chosen to create an overlap:


As you can see, the momentum (MOM) line is just the price index shifted to the right by 15 months.  It differs from the moving average (MA) line in that it retains all of the index’s volatility.  The moving average line, in contrast, smooths that volatility away by averaging.

Now, on an ex ante basis, there’s no reason to expect the index’s volatility, when carried over into the momentum line, to add value to the timing process.  Certainly, there will be individual cases where the specific movements in the line, by chance, help the strategy make better trades.  But, statistically, there will be just as many cases where those movements, by chance, will cause the strategy to make worse trades.   This expectation is confirmed in actual testing.  Across a large universe of markets and individual stocks, we find no statistically significant difference between the performance results of the two strategies.

For convenience, we’ve chosen to focus on the strategy that has the simpler, cleaner look–the moving average strategy.  But we could just as easily have chosen the momentum strategy.  Had we done so, the same insights and conclusions would have followed.  Those insights and conclusions apply to both strategies without distinction.


Now, there are two “knobs” that we can turn to influnence the strategy’s performance.  The first “knob” is the moving average period, which we’ve already examined to some extent, but only on a highly simplified model of prices.  The second “knob” is the period between the checks, i.e., the checking period, whose effect we have yet to examine in detail.  In what follows, we’re going to examine both, starting with the moving average period.  Our goal will be to optimize the performance of the strategy–“tweak” it to generate the highest possible relative gain on the index.

We start by setting the moving average period to a value of 30 months.  The following chart shows the strategy’s performance:


As you can see, the strategy slightly outperforms.  It doesn’t make sense for us to use 30 months, however, because we know, per our earlier rule, that shorter moving average periods will capture more of the downturns and generate greater outperformance:

index34So we shorten the moving average period to 20 months, expecting a better return.  The following chart shows the performance:


As we expected, the outperformance increases.  Using a 30 month moving average period in the prior case, the outperformance came in at 1.13 (dotted green line, right y-axis).  In reducing the period to 20 months, we’ve increased the outperformance to 1.20.

Of course, there’s no reason to stop at 20 months.  We might as well shorten the moving average period to 10 months, to improve the performance even more.  The following chart shows the strategy’s performance using a 10 month moving average period:


Uh-oh!  To our surprise, the performance gets worse, not better, contrary to the rule.

What’s going on?

Before we investigate the reasons for the deterioration in the performance, let’s shorten the moving average period further.  The following two charts show the performances for moving average periods of 5 months and 1 month respectively:


As you can see, the performance continues to deteriorate.  Again, this result is not what we expected.  In the prior section, when we shortened the moving average period, the performance got better. The strategy captured a greater portion of the cyclical downturns, converting them into larger gains on the index.  Now, when we shorten the moving average period, the performance gets worse.  What’s happening?

Here’s the answer.  As we saw earlier, when we shorten the moving average period, we cause the moving average to trail (hug) more closely to the price.  In the previous section, the price was a long, clean, cyclical sine wave, with no short-term volatility that might otherwise create whipsaws, so the shortening improved the strategy’s performance.  But now, we’ve added substantial short-term volatility to the price–random deviations that are impossible for the strategy to successfully time.  At longer moving average periods–30 months, 20 months, etc.–the moving average trails the price by a large amount, and therefore never comes into contact with that volatility.  At shorter moving average periods, however, the moving average is pulled in closer to the price, where it comes into contact with the volatility.  It then suffers whipsaws that it would not otherwise suffer, incurring gap losses that it would not otherwise incur.

Of course, it’s still true that shortening the moving average period increases the portion of the cyclical downturns that the strategy captures, so there’s still that benefit.  But the cumulative harm of the additional whipsaws introduced by the shortening substantially outweighs that benefit, leaving the strategy substantially worse off on net.

The following two charts visually explain the effect of shortening the moving average period from 30 months to 5 months:


If the “oscillations” associated with random short-term price deviations could be described as having a period (in the way that a sine wave would have a period), the period would be very short, because the oscillations tend to “cycle” back and forth very quickly.  Given our “MA Period << Oscillation Period” rule, then, it’s extremely difficult for a timing strategy to profitably time the oscillations.  In practice, the oscillations almost always end up producing whipsaws.

Ultimately, the only way for the strategy to avoid the implied harm of random short-term price deviations is to avoid touching them.  Strategies that use longer moving average periods are more able to do that than strategies that use shorter ones, which is why strategies that use longer moving average periods often outperform, even though they aren’t as effective at converting downturns into gains.

The following table describes the fundamental tradeoff associated with changing the moving average period.  Green is good for performance, red is bad:


As the table illustrates, when we shorten the moving average period, we increase both the good stuff (captured downturns) and the bad stuff (whipsaws).  When we lengthen the moving average period, we reduce both the good stuff (captured downturns) and the bad stuff (whipsaws).

Ultimately, optimizing the strategy is about finding the moving average period that brings the moving average as close as possible to the price, so that it maximally captures tradeable cyclical downturns, but without causing it to get so close to the price that it comes into contact with untradeable short-term price volatility.  We can imagine the task as being akin to the task of trying to pull a rose out of a rose bush.  We have to reach into the bush to pull out the rose, but we don’t want to reach so deep that we end up getting punctured by thorns.


Now, the index that we built is quoted on a monthly basis, at the close.  If we wanted to, we could change the period between the quotes from one month to one day–or one hour, or one minute, or one second, or less.  Doing that would allow us to reduce the moving average period further, so that we capture cyclical downturns more fully than we may have otherwise been capturing them.  But it would also bring us into contact with short-term volatility that we were previously unaware of and unexposed to, volatility that will increase our whipsaw losses, potentially dramatically.

We’re now ready to examine the strategy’s second “knob”, the period of time that passes between the checks, called the “checking period.”  In the current case, we set the checking period at one month.  But we could just as easily have set it at one year, five days, 12 hours, 30 minutes, 1 second, and so on–the choice is ours, provided that we have access to price quotes on those time scales.

The effects of changing the checking period are straightforward.  Increasing the checking period, so that the strategy checks less frequently, has the effect of reducing the quantity of price information that the strategy has access to.  The impact of that reduction boils down to the impact of having the strategy see or not see the information:

  • If the information is useful information, the type that the strategy stands to benefit from transacting on, then not seeing the information will hinder the strategy’s performance. Specifically, it will increase the gap losses associated with each transaction.  Prices will end up drifting farther away from the moving average before the prescribed trades take place.
  •  If the information is useless information, the type that the strategy does not stand to benefit from transacting on, then not seeing the information will improve the strategy’s performance.  The strategy will end up ignoring short-term price oscillations that would otherwise entangle it in whipsaws.

The converse is true for reducing the checking period, i.e., conducting checks more frequently.  Reducing the checking period has the effect of increasing the quantity of price information that the strategy has access to.  If the information is useful, then the strategy will trade on it more quickly, reducing the extent to which the price “gets away”, and therefore reducing gap losses.  If the information is useless, then it will push the strategy into additional whipsaws that will detract from performance.

The following table illustrates the tradeoffs associated with changing the checking period.  As before, green is good for performance, red is bad:


The following two charts show the performances of the strategy in the total U.S. equity market index from January of 1945 to January of 1948.  The first chart runs the strategy on the daily price index (checking period of 1 day) using a 100 day moving average (~5  months).  Important captured downturns are circled in green, and important whipsaws are circled in red:


The second chart runs the strategy on the monthly price index (checking period of 1 month) using a 5 month moving average.


Notice the string of repeated whipsaws that occur in the left part of the daily chart, around the middle of 1945 and in the early months of 1946.  The monthly strategy completely ignores those whipsaws.  As a result, across the entire period, it ends up suffering only 2 whipsaws.  The daily strategy, in contrast, suffers 14 whipsaws.  Crucially, however, those whipsaws come with average gap losses that are much smaller than the average gap losses incurred on the whipsaws in the monthly strategy.  In the end, the two approaches produce a result that is very similar, with the monthly strategy performing slightly better.

Importantly, on a daily checking period, the cumulative impact of losses associated with traversing the bid-ask spread become significant, almost as significant as the impact of gap losses, which, of course, are smaller on a daily checking period.  That’s one reason why the monthly strategy may be preferable to the daily strategy.  Unlike in the daily strategy, in the monthly strategy we can accurately model large historical bid-ask spreads without destroying performance.

We see this in the following two charts, which show the performance of the daily strategy from February 1928 to July 2015 using (1) zero bid-ask spread losses and (2) bid-ask spread losses equal to the historical average of roughly 0.6%:


The impact on the strategy ends up being worth 1.8%.  That contrasts with an impact of 0.5% for the monthly strategy using the equivalent 10 month moving average preference of the strategy’s inventor, Mebane Faber.  We show the results of the monthly strategy for 0% bid-ask spread losses and 0.6% bid-ask spread losses in the two charts below, respectively:

totusmkt1928noslip totusmkt192860slip

From looking at the charts, the daily version of the strategy appears to be superior.  But it’s hard to be confident in the results of the daily version, because in the last several decades, its performance has significantly deteriorated relative to the performance seen in prior eras of history.  The daily version generated frequent whipsaws and no net gains in the recent downturns of 2000 and 2008, in contrast with the substantial gains that it generated in the more distant downturns of 1929, 1937, 1970 and 1974.  The deterioration may be related to the deterioration of the one day momentum strategy (see a discussion here), whose success partially feeds into the success of all trend-following strategies that conduct checks of daily prices.

To summarize:

  • For the moving average period:


  • For the checking period:


Aggregate Indices vs. Individual Securities: Explaining the Divergent Results

In this section, I’m going to use insights gained in previous sections to explain why the trend-following strategies that we tested in the prior piece work well on aggregate indices, but not on the individual securities that make up those indices.   Recall that this was a key result that we left unresolved in the prior piece.  

To understand why the momentum and moving average strategies that we tested in the prior piece work on aggregate indices, but not on the individual securities that make up those indices, we return to our earlier decomposition of stock prices.  An understanding of how the phenomenon of indexing differentially affects the two latter variables in the decomposition–cyclicality and volatility–will give the answer.


We first ask, what effect does increasing each component of the decomposition, while leaving all other components unchanged, have on the strategy’s performance?  We first look at growth.

Growth: Increasing the growth, which is the same thing as postulating higher future returns for the index, tends to impair the strategy’s performance.  The higher growth makes the it harder for the strategy to capture downturns–the downturns themselves don’t go as far down, because the growth is pushing up on them to a greater extent over time.  The subsequent buys therefore don’t happen at prices as low as they might otherwise happen, reducing the gain on the index.

Importantly, when we’re talking about the difference between annual growth numbers of 6%, 7%, 8%, and so on, the effect of the difference on the strategy’s performance usually doesn’t become significant. It’s only at very high expected future growth rates–e.g., 15% and higher–that the growth becomes an impeding factor that makes successful timing difficult.  When the expected future return is that high–as it was at many past bear market lows–1932, 1941, 1974, 2009, etc.–it’s advisable to abandon market timing altogether, and just focus on being in for the eventual recovery.   Market timing is something you want to do when the market is expensive and when likely future returns are weak, as they unquestionable are right now.

Cyclicality:  Admittedly, I’ve used the term “cyclicality” somewhat sloppily in the piece. My intent is to use the term to refer to the long, stretched-out oscillations that risk assets exhibit over time, usually in conjunction with the peaks and troughs of the business cycle, which is a multi-year process.  When the amplitude of these oscillations is increased, the strategy ends up capturing greater relative gains on the downturns.  The downturns end up going farther down, and therefore after the strategy exits the index on their occurrence, it ends up buying back at prices that have dropped by a larger amount, earning a greater relative gain on the index.

In the following slideshow, we illustrate the point for the 10 month moving average strategy (click on any image for a carousel to open).  We set the volatility at a constant value of 2% (ignore the unit for now), and increase the amplitude of the 7 year sinusoidal oscillation in the earnings yield from 10% to 50% of the sine wave’s base or height:

As you can see, the strategy gains additional outperformance on each incremental increase in the sine wave’s amplitude, which represents the index’s “cyclicality.”  At a cyclicality of 10%, which is almost no cyclicality at all, the strategy underperforms the index by almost 80%, ending at a laughable trailing return ratio of 0.2.  At a cyclicality of 50%, in contrast, the strategy outperforms the index by a factor of 9 times.

Volatility:  I’ve also used the term “volatility” somewhat sloppily.  Though the term has a defined mathematical meaning, its associated with intuition of “choppiness” in the price. In using the term, my intention is to call up that specific intuition, which is highly relevant to the strategy’s performance.

Short-term volatility produces no net directional trend in price over time, and therefore it cannot be successfully timed by a trend-following strategy.  When a trend-following strategy comes into contact with it, the results are useless gap losses and bid-ask traversals, both of which detracts substantially from performance.  The following slideshow illustrates the point.  We hold the cyclicality at 25%, and dial up the volatility from 2% to 5% (again, ignore the meaning of those specific percentages, just focus on the fact that they are increasing):

As the intensity of the volatility–the “choppiness” of the “chop”–is dialed up, the moving average comes into contact with a greater portion of the volatility, for more whipsaws. Additionally, each whipsaw comes with larger gap losses, as the price overshoots the moving average by a larger amount on each cross.  In combination, these two hits substantially reduce the strategy’s performance.

We can understand the strategy’s performance as a tug of war between cyclicality and volatility.  Cyclicality creates large sustained downturns that the strategy can profitably capture.  Volatility, in contrast, creates whipsaws that the strategy suffers from being involved with.  When cyclicality proves to be the greater presence, the strategy outperforms.  When volatility proves to be the greater presence, the strategy underperforms.


Now, to the reason why indexing improves the performance of the strategy.  The reason is this.  Indexing filters out the volatility contained in individual securities, while preserving their cyclicality.

When an index is built out of a constituent set of securities, the price movements that are unique to the individual consistituents tend to get averaged down.  One security may be fluctuating one way, but if the others are fluctuating another way, or are not fluctuating at all, the fluctuations in the original security will get diluted away in the averaging. The price movements that are common to all of the constituents, in contrast, will not get diluted away, but will instead get pulled through into the overall index average, where they will show up in full intensity.

In risk assets, the cyclicality associated with the business cycle tends to show up in all stocks and risky securities.  All stocks and risky securities fall during recessions and rise during recoveries.  That cyclicality, then, tends to show up in the index.  Crucially, however, the short-term volatility that occurs in the individual securities that make up the index–the short-term movements associated with names like Disney, Caterpillar, Wells Fargo, Exxon Mobil, and so on, where each movement is driven by a different, unrelated story–does not tend to show up in the index.

The 100 individual S&P 500 securities that we tested in the prior piece exhibit substantial volatility–collectively averaging around 30% for the 1963 to 2015 period.  Their cyclicality–their tendency to rise and fall every several years in accordance with the ups and downs of the business cycle–is not enough to overcome this volatility, and therefore the strategy tends to trade unprofitably.  But when the movements of all of the securities are averaged together into an index–the S&P 500–the divergent movements of the individual securities dilute away, reducing the volatility of the index by half, to a value of 15%.  The cyclicality contained in each individual constituent, however, is fully retained in the index.  The preserved cyclicality comes to dominate the diminished volatility, allowing the strategy to trade profitably.

We illustrate this process below.  The six different securities in the following six charts (click for a slideshow) combine a common cyclicality (long-term oscillations on a 7 year cycle) with large amounts of random, independent volatility.  In each security, the volatility is set to a very high value, where it consistently dominates over the cyclicality, causing the strategy to underperform:

When we take the different securities and build a combined index out of them, the volatility unique to each individual security drop outs, but the cyclicality that the securities have in common–their tendency to rise and fall every 7 years in accordance with our simplified sinusoidal model of the business cycle–remains.  The strategy is therefore able to outperform in the combined index.

The following chart shows that outperformance.  The black line is an equal-weighted index built out of the six different high-volatility securities shown above:


As you can see, the volatility in the resultant index ends up being minimal.  The 7 year sinusoidal cyclicality, however, is preserved in fullness.  The strategy therefore performs fantastically in the index, even as it performs terribly in the individual constituents out of which the index has been built.  QED.

To summarize:

  • The moving average strategy’s performance is driven by the balance between cyclicality and volatility. When cyclicality is the greater presence, the strategy captures large, profitable downturns and outperforms.  When volatility is the greater presence, the strategy gets caught in recurrent whipsaws and underperforms.
  • The strategy underperforms in individual securities because the volatility in those securities is too high relative to the cyclicality.  The securities “chop” around too much, creating too many whipsaws.  When they do experience the kinds of sustained downturns that the strategy might profit from, the downturns end up not being deep enough to offset the plentitude of whipsaws.
  • When different individual securities are combined into an index, the movements that the securities have in common carry through to the index.  The movements that they do not have in common get averaged down and diluted away in the index.
  • There’s a cyclicality common to all risky securities–the cyclicality of the business cycle, which tends to inflate and depress valuations of the entire market in unison.  When an aggregate index is built out of individual securities, that cyclicality carries through into the final index result.  But, crucially, the random price deviations that constitute short-term volatility, deviations that the securities do not have in common with each other, do not carry through.  Rather, they get averaged down and diluted away in the index.
  • As a result, the balance between cyclicality and volatility in the index, in contrast to the individual securities, proves to be favorable to the strategy, allowing it to outperform.

An Improved Trend-Following Strategy: Growth-Trend Timing

In what follows, I’m going to introduce the modification to the strategy that most readers are probably here to see: Growth-Trend Timing (GTT).  As a warning, GTT is hardly an investment panacea.  Like any other strategy, it carries a set of risks and vulnerabilities. That said, it’s total risk-reward proposition, in my view, is signficantly more attractive than the the risk-reward propositions of the other systematic long-horizon strategies that have been proposed.

What I find particularly compelling about the strategy is that it makes sense.  It tries to systematically do what any good human trend-following trader has to do to time the market well–distinguish between those downward price breaks that are real, that are going to be met with sustained follow-throughs to lower prices, and those downward price breaks that are fake-outs, that are only going to produce whipsaw losses.  Granted, there may be more efficient ways for us to make that distinction than to use GTT–maybe the best way is for us to use our market intuitions directly, in real-time, abandoning systematic approaches altogether.  But GTT, in my opinion, represents a great place for an investor to start.

Recall the equation for the moving average strategy’s total return performance, an equation that also describes the total return performance of the momentum strategy. Looking at that equation, it’s clear that for the strategy to outperform on total return, the cumulative gains from captured downturns have to exceed the cumulative losses from whipsaws:

(8) Strategy(t) = Index(t) + Cumulative Gains from Captured Downturns(t) – Cumulative Losses from Whipsaws(t)

We said in the previous piece that we wanted market timing strategies to have a sound analytic basis. So, let’s ask: What is the sound analytic basis for believing that if we time the market each month using the 10 month moving average, as Mebane Faber recommends, or using a monthly 12 month momentum signal, as others have recommended, that our gains from captured downturns will exceed, or at least minimally keep up with, our losses from whipsaws?  Is there any a priori basis at all for that bellief?

Simply pointing to the empirical result itself is not enough, for without an explanation for the result, an ability to understand why it is achieved, we have no way to estimate the probability that it will persist into future data.  We have to just cross our fingers and hope the patterns of the past persist into the future.  But that might not happen, especially if the pattern has been extracted from a very small sample size (which, in this case, it has been).

The most common explanation given for why the strategy outperforms involves an appeal to behavior.  Here’s my preferred way of framing that explanation.  For maximal clarity, consider a simplified flow-based model of security pricing:


The model delineates two categories of market participants: market takers and market makers.  The market takers want to change the asset compositions of their portfolios, so they place market orders to buy and sell securities.  The market makers do not want to change the asset composition of their portfolios, but instead simply want to collect the spread between the prices that the market takers buy and sell at, remaining positionally neutral themselves.  So the market makers lay out simultaneous bids and asks at price midpoints that they think will lead to matching buying and selling order flows.  They continually change these price midpoints in order to ensure that the incoming order flows match.  They are financially incented to execute this task correctly, for if they execute it incorrectly–if they put their price midpoints in the wrong places–then they will buy more than they sell, or sell more than they buy, building up inventory or debt that they themselves will have to liquidate at a loss.

Ultimately, that’s where “price” in the modern market comes from.  Market makers, usually computers, try to find the general price midpoints that will lead to matching market taker order flow.  They lay out bids and asks at those midpoints, and collect the spread between them as that flow comes in.  Price changes, then, are the result of order flow imbalances.  If order flow proves to be imbalanced at some price, market makers will have to change the price–the midpoints of the bids and asks that they are laying out–to get the flows to balance again.  If they don’t do that, they will build up inventory or debt that they will have to liquidate into the market later, losing money themselves.

Now, suppose that there’s a negative fundamental trend taking place somewhere in the market.  Maybe a “recession” is beginning–an economic downturn that will generate falling revenues and earnings, tightening credit conditions, rising strains in the financial system, deteriorating social mood and investor sentiment, and ultimately, lower prices and valuations.  Surely, if investors become aware of that development, they will sell in advance of it.

But, crucially, they are not going to all become aware of it at the same time.  At first, only a small number of investors will see it.  So they will stop buying and start selling.  Their actions will redirect a portion of the previously existing buying flow–say, 0.1% of it–into selling flow.  A flow imbalance–consisting of too much selling flow, not enough buying flow–will then result.  In response to the imbalance, the market makers will lower the price.  In response, other people who aren’t yet aware of what’s coming, and who are psychologically anchored to higher prices, will see “value” in the lower prices, and will increase their buying and reduce their selling, restoring the balance. This stabilizing feedback response will attenuate the drop and keep prices from falling too quickly or too far.  Instead of plunging, prices will simply “act heavy.”

fearAs the deteriorating fundamental picture increasingly reveals itself, caution will increase, and more order flow will shift from buying to selling.  In response to the shift, market makers will reduce the price further.  But there will still be those that don’t see or appreciate what’s happening.  They will step in and buy the perceived bargains that are on their hands.

Of course, that can only go on for so long.  As the price continues to behave poorly, more and more previously constructive investors will become conditioned–in the Skinnerian sense–into a worried, risk-averse mindset.  More and more buyers will feel compelled to tap out and sell.  To keep the order flow balanced, market makers will have to continue to reduce their price midpoints.  Eventually, a positive feedback loop of fear will take hold, and the market’s fragile equilibrium will snap.  The market will take a strong plunge lower, to a level that offers genuine bargains that fit with the deteriorating fundamental outlook.

Crucially, before the eventual plunge takes place, three factors will prevent the system from unraveling: (1) uncertainty, kept in place by the limited in-trickle of information, (2) the reward conditioning of the market gains that investors will have experienced in the cycle up to that point, which will encourage an initially constructive mentality towards the downward price trend (“it’s a buying opportunity!”), and (3) the stabilizing feedback response of anchoring, which causes people to see “opportunity” rather than “danger” in lower prices, and to therefore provide buying support to markets that would otherwise fall more quickly.

Before the plunge occurs, then, a slower, more gentle, more benign negative price trend will take hold.  The market’s momentum, captured in its trailing annual return, will go negative.  Prices will fall below key long-term moving averages.  Internals will deteriorate. Crucially, market timing strategies that pivot off of these signals will then successfully get out, avoiding the bulk of the coming downturn.

On the eventual recovery, the process will happen in reverse.  Crucially, the full upturn will not take place instantaneously.  Rather, it will show early clues and signs.  The market’s momentum will go positive.  Prices will rise above key long-term moving averages. Internals will improve.  Market timing strategies that pivot off of these signals will then get back in, usually at meaningfully lower prices than they got out at.  The result will be outperformance over the index.

As we saw earlier, the only condition that needs to be met for the strategies to successfully get out and get back in at lower prices is that their periods–i.e., the lengths of their momentum horizons and moving averages–be substantially shorter than the period of the oscillation being timed.  If the oscillation being timed is the business cycle, then the period simply has to be less than a few years, which is how long it takes for expansions to morph into recessions and back into expansions.  A 12 month momentum period, and a 10 month moving average period, obviously qualify–they are significantly less than the typical average 7 year (84 month) business cycle length.

It would seem, then, that we’ve identified the reason why the momentum and moving average strategies outperform.  The recommended periods–12 months for the momentum strategy, and 10 months from the moving average strategy–have been set to values that are short enough to allow the strategies to successfully capture the downturns associated with the business cycle, given the amount of time that it takes for those downturns to occur and reverse into recoveries, an amount of time on the order of years rather than months.

But wait.  We haven’t said anything about whipsaws.  Per the equation, it’s not enough for the strategy to simply extract gains from downturns.  The gains have to exceed the losses that will inevitably incurred in whipsaws.  What reason do we have to believe that the profits that the strategy will capture from downturns using a 12 month momentum period, or a 10 month moving average period, will consistently exceed the cumulative effect of the many whipsaws that will be suffered on those short periods?  That is where the explanation is lacking.  

Every market participant knows what the business cycle is, and is aware of the types of large downward price movements that it can produce.  Every active market participant is eager to “time” it, where possible.  That may not have been the case in the year 1918, but it’s definitely the case in the year 2016.  Market participants in the year 2016 are extremely sensitive to the market’s history and well-defined cyclical tendencies. That sensitivity, if it leads to volatile short-term trading around the feared prospect of turns in the cycle, has the potential to create substantial whipsaw losses for the strategy.

Frequently, market participants will sense impending downturns and sell, pushing price into a negative trend, when there’s no fundamental process to carry the downturn through. The negative price trend will then reverse itself, inflicting gap losses on the strategy. That’s exactly what has happened in the four documented whipsaws that the moving average strategy has suffered in U.S. equities since the 2009 low:


In the late summer of 2010, as investors collectively became scared of a “double-dip” scenario, the price trend went negative.  But the move petered out and reversed itself, because it wasn’t supported by the fundamental picture, which turned out to be much more optimistic.  The same thing happened again in the fall of 2011, when additional fears related to the potential impact and calamity of a dissolution and financial meltdown in the Eurozone took hold.  Those fears turned out to be unfounded, and the market quickly recovered its losses, inflicting whipsaws on whoever tried to exit.  The final whipsaw, of course, occurred just a few months ago, in the August market scare.  In total, the four whipsaws have imposed a cumulative 25% loss on the strategy relative to buy and hold–a very big hit.  Will profitable trades in the coming downturn–whenever it comes–make up for that loss?  We don’t know.

As I write this piece, the market’s price trend is well below key momentum and moving average boundaries.  But it wasn’t below those boundaries at the end of last month, so the strategy hasn’t yet exited.  If things stay where they are, the strategy will sell at the end of the current month at very low prices relative to where the moving average crossover actually happened, taking on a large gap loss.  The likelihood of another whipsaw for the strategy is therefore high, particularly if current fears about the energy sector and China move to the backburner.

What remains lacking, then, is a clear analytic explanation for why, on a going-forward basis, we should expect captured downturns to exceed whipsaws in the strategy.  In the last several years, the strategy was lucky to get a big downturn that it was able to exploit–the 2008 Great Recession.  Before that, it was gifted with big drops associated with the recessions of 1974, 1970, 1937, and 1929.  Beyond those drops–which collectively amount to a sample size of only 5–every other transaction has either been a negligible gain, or a whipsaw loss, of which there have been a huge number (see the red boxes below):


My own gut sense is that the kind of deep, profitable downturns that we saw in 2008, 1974, 1937 and 1929 will not happen again for a long time–on the order of decades.  The consequence of secular stagnation–the reality of which has become an almost indisputable economic fact–is that you get weaker expansions, and also weaker downturns–weaker cyclicality in general, which is exactly what we’ve seen in the current cycle.  To a trend-following strategy, that’s the equivalent of poison.  It attenuates the sources of gains–large downturns that get captured for profit–without attenuating the sources of losses: choppy volatility that produces whipsaws.

We therefore need to find a way to improve the strategy.  The best place to start is to examine the above table and take note of the clear fact that the strategy outperforms by successfully timing the business cycle–recessions.  That’s it’s primary strategy–recession-timing.  When the strategy switches inside of recessions, it tends to be profitable.  When it switches outside of recessions, it tends to be unprofitable.

The following two charts make the point more clear.  The first chart shows the strategy’s cumulative performance relative to buy and hold inside recession, the second, outside recession:

mmain mmaout3

Inside recession, the strategy produces a cumulative return equal to 3.5 times the return of buy and hold. Outside recession, the strategy produces a cumulative return equal to 0.4 times the return of buy and hold.  That’s a cumulative performance difference of almost 10X.

A natural way to improve the strategy, then, is to try to teach it to differentiate between situations where the fundamental backdrop makes recession likely, and situations where the fundamental backdrop makes recession unlikely.   If recession is likely, then the negative price trend is likely to be met with sustained follow through, resulting in profitable captured downturns.  But if recession is unlikely, as it was in the summer of 2010, the fall of 2011, the fall of 2015, and possibly now, then the negative price trend is likely to peter out and reverse itself, inflicting whipsaw losses on the strategy.  If the strategy can distinguish between the two, then it can turn itself off in the latter case, where recession is unlikely, so that it avoids the whipsaw losses.

That’s exactly what Growth-Trend Timing does.   It takes various combinations of high quality monthly coincident recession signals, and directs the moving average strategy to turn itself off during periods when those signals are unanimously disconfirming recession, i.e., periods where they are all confirming a positive fundamental economic backdrop.

The available monthly signals are:

  • Real Retail Sales Growth (yoy)
  • Industrial Production Growth (yoy)
  • Real S&P 500 EPS Growth (yoy), modeled on a total return basis.
  • Employment Growth (yoy)
  • Real Personal Income Growth (yoy)
  • Housing Start Growth (yoy)

The precise timing criterion for GTT is as follows.  Take a reliable monthly growth signal, or better, a collection of reliable monthly growth signals that overlap well to describe the total state of the economy:

  • If, at the close of the month, the growth signals for the prior month are unanimously positive, then go long or stay long for the next month, and ignore the next step.
  • If, at the close of the month, the growth signals for the prior month are not unanimously positive, then if price is above the 10 month moving average, then go long or stay long for the next month.  If price is below the 10 month moving average, sell or stay out for the next month.

Importantly, in backtesting, the growth signals need to coincide with the actual time in which the data used to produce them becomes available.  When monthly economic numbers are published, they’re usually published for the prior month.  So, in any backtest, the strategy needs to trade off of the previous month’s economic numbers, not the current month’s numbers, which are unavailable.

Empirically, for the U.S. economy, the strongest historical recession indicator is Real Retail Sales Growth.  Since the Census Bureau began to collect and publish it as a series in the late 1940s, it has expeditiously nailed every recession that has occurred, giving only a few false positives.  Notice how the blue line consistently crosses below zero at the left sides of the gray columns, the beginnings of the recessions. (Link: FRED)


Real Retail Sales Growth is a truly fantastic indictor, a direct line into the health of the U.S. consumer, the engine of the U.S. economy.  We’re therefore going to use it as a fundamental, preferred signal for GTT.

The following chart shows GTT’s performance on the monthly S&P 500 using real retail sales growth as a single growth signal from January 1947 to November 2015:


The purple bars at the bottom show periods where real retail sales growth is declaring “no recession.”  In those periods, the strategy turns itself off.  It stops timing altogether, and simply stays long, to avoid unnecessary whipsaws.  In the places where there are no purple bars, real retail sales growth is negative, suggesting that recession is likely. The strategy therefore puts its timing protection back on, switching into and out of the market based on the usual moving average criterion.  Notice that the purple bars overlap quite well with the grey columns, the recessions.  The close overlap confirms the power of real retail sales growth as a coincident recession indicator.

As you can see in the chart, GTT (blue line) outperforms everything: buy and hold (gray line), the X/Y portfolio (black line), and the 10 month moving average strategy (abbreviated “MMA”, and shown in the green line).  The dotted red line shows the cumulative outperformance of GTT over MMA.  As expected, GTT continually ratchets higher over the period.  The driver of its consistent uptrend is its avoidance of the useless whipsaws that MMA repeatedly entangles itself in.

Another accurate recession indicator is Industrial Production Growth.  It’s available to a period much farther back in time–specifically, the year 1919.  With the exception of a single costly omission in the 1974 recession, it’s done a very good job of accurately calling out recessions as they’ve happened. (Link: FRED)


The following chart shows GTT’s performance on the monthly S&P 500 using industrial production growth as a single growth signal from February 1928 to November 2015:


Again, GTT strongly outperforms both buy and hold and MMA.  However, the outperformance isn’t as strong as it was when real retail sales growth was used, primarily because the industrial production signal misses the 1974 recession, and also because the strategy is late to exit in the 1937 recession.

Real retail sales growth and industrial production growth represent two diverse, independently reliable indicators of the health of the two fundamental segments of the overall economy: consumption and production.  The best result comes when they are put together, in combination.  The problem, of course, is that retail sales data isn’t available prior to 1947, so we can’t test the two signals together back to the beginning of each of their time series.  Fortunately, there’s a fix to that problem.  To get data on real retail sales before 1947, we can use real department store sales and real shoe sales as a proxy. Both were published by the government on a monthly basis back to the late 1910s. (Link: FRED)


Using that proxy, and combining the two signals together, we observe the following performance for GTT  back to the inception of the data.  Note that a recessionary indication from either metric turns the strategy’s timing function on.  Otherwise, the timing function is off, and the strategy stays long:


The above construction of GTT–using the dual signals of real retail sales growth and industrial production growth–is the preferred “recession-timing” construction shown in the beginning of the piece.  As you can see, the strategy on that construction consistently outperforms everything, by a very large margin.  It’s only weakness is its failure to expeditiously exit prior to the 1937 recession, a recession that was almost impossible to predict using data.

The following table shows GTT’s entries and exits:


Relative to MMA, the win rate improves from roughly 24% to roughly 39%.  That improvement drives the bulk of the improvement in the strategy’s overall performance.  As intended, the strategy successfully captures all of the downturns that MMA captures, but without all of the whipsaws.

The following chart shows the timing power of the retail sales and industrial production growth signals taken individually, outside of a larger trend-following framework.  Real retail sales growth timing is shown in red, and industrial production growth timing is shown in yellow.  When growth is positive, the strategies go long, when negative, they go to cash:


Evidently, the individual timing performance of the growth signals is quite weak.  The main reason for the weakness is that the signals are overly sensitive to recession risk, particularly on the back end.  They stay negative after the recession is over, and are therefore late to re-enter the market on upturns, at a significant cost to performance. Their lateness doesn’t hurt the result for GTT, however, because they are being used as mere overlays for a more price-cognizant trend-following approach.  GTT respects price and re-enters the market whenever the trend goes positive, no matter what the growth signals happen to be saying.

An additional reason for the weak timing performance of the growth signals taken in isolation is that they have noise in them–they sometimes go falsely negative outside of recession.  When the signals are used to time the market in isolation, outside of a trend-following approach, the noise leads to whipsaws.  In GTT, however, the noise is modulated by the trend-following criterion–a growth signal might go negative, but nothing will happen unless the market also happens to be in a downtrend.  What are the odds that a negative growth signal and a price downtrend will both occur by coincidence, when everything else in the economy is just fine?  Very low, which is the reason that GTT is able to so efficient in its timing.

A third coincident recession indicator is corporate earnings growth.  The best way to model that indicator is to use real S&P 500 Total Return EPS, which corrects for both inflation and for changes in dividend payout ratios.  The following chart shows GTT’s performance on the single indicator of Total Return EPS growth from February 1928 to November 2015:


The strategy again outperforms everything, but not by as large of a margin.  It underperforms other iterations of the strategy in that it misses both the 1937 recession and the 1974 recession.

Another recession indicator worth examining is employment growth.  Economists normally categorize employment growth as a lagging indicator, but lagging indicators can work reasonably well for GTT.  To improve the recessionary accuracy of the indicator, we adjust it for the size of the labor force. (Link: FRED)


The following chart shows GTT’s performance using employment growth as a single growth indicator from February of 1949 to November of 2015:


Once again, the strategy outperforms everything.  The red line, which shows GTTs cumulative outperformance over MMA, consistently marches upward over the period.

The final two coincident recession indicators worth looking at are housing start growth and real personal income growth.  Housing start growth is a leading indicator, and is somewhat erratic, so we combine it with the more mellow metric of real personal income growth. We also adjust it for the size of the labor force. The recessionary breakpoints are 3% for real personal income growth, and -10% for housing start growth. (Link: FRED)


The following chart shows GTT’s performance using the two metrics in a combined growth signal from January 1960 to November 2015.  If either is negative, the strategy turns its timing function on. Otherwise, it stays invested:


Once again, the strategy outperforms everything.

Now, the reason that many investors prefer to use a monthly checking period when executing trend-following strategies is that daily checking periods produce too many unnecessary trades and therefore too many gap and bid-ask spread losses.  But Growth-Trend Timing dramatically reduces the number of unnecessary trades, filtering them out with the growth signal.  It can therefore afford to check prices and transact on a daily basis.

The following charts shows GTT’s performance from February 1st, 1928 to July 31st, 2015 using a 200 day moving average timing criterion alongside our preferred GTT signal combination of real retail sales growth and industrial production growth:



Once again, the strategy does very well, consistently outperforming everything.  The strategy’s outperformance over the simple 200 day moving average strategy is 120 bps per year–not as large as in the monthly version, but only because the moving average strategy performs more poorly in the monthly version.  My suspicion, which I cannot prove, is that the simple 200 day moving average strategy is somehow benefiting from residuals associated with the daily momentum phenomenon evaluated in a prior piece. That phenomenon began to disappear from the market in the 1970s, which, not coincidentally, is when the 200 day moving average strategy began to consistently underperform.

What’s especially important, in all of these charts, is the way in which the strategy has shown recent strength.  MMA and other momentum-focused strategies have tended to deteriorate in efficacy over time, especially on a daily horizon.  GTT, in contrast, has not weakened it all–it’s only gotten stronger.  That’s crucial, since a strategy that works in recent data is more likely to be pivoting off of causalities in the system that are still there to be exploited.

In looking at these charts, the reader may get the impression that it’s easy to improve a trend-following strategy by adding appendages after the fact.  But it’s not so easy.  To illustrate, here’s an appendage that we might expect to work, but that clearly doesn’t work: Shiller CAPE valuation.  The following chart shows the performance of Value-Trend Timing (VTT) using the Shiller CAPE from January of 1947 to July of 2015.  Growth-Trend Timing using retail sales and industrial production is shown for comparison:


VTT functions in the exact same manner as GTT, except that instead of using a growth signal, it uses a valuation signal.  If, at a given point in time, the market’s valuation as measured by the Shiller CAPE is “cheap“, i.e., cheaper than the historical average valuation up to that point, VTT locks in the bargain and goes long.  If the market’s valuation is expensive, i.e., more expensive than the historical average valuation up to that point, then VTT times the market using the 10 month moving average strategy.

As you can see, VTT adds literally nothing to the performance of MMA.  That’s because the market’s valuation relative to its historical average does not represent a reliable timing signal.  The economy’s likelihood of being in recession, in contrast, does represent such a signal.  If you want to successfully time the market, what you need to be is a strong macroeconomist, not a valuation expert.

Readers probably want to know what GTT is saying about the market right now.  The answer is predictably mixed.  Real retail sales growth, employment growth, real personal income growth, and housing start growth are all healthily positive, reflecting strength in the domestic U.S. household sector.  If you choose to build GTT on those signals, then you will be long right now, even though the market’s current price trend is negative.  Of course, weakness in the energy sector, China, and the global economy more generally win out in the current tug of war with domestic U.S. strength, then the strategy, in failing to sell here, or in failing to have sold at higher levels, is going to take deeper losses.

At the same time, real Total Return EPS growth, industrial production growth, and production proxies that might be used in place of industrial production growth (e.g., ISM readings), are flashing warning signals, consistent with known stresses in the energy sector and in the global economy more generally (especially emerging markets), which those signals are more closely tied to.  If you choose to build GTT using those signals individually or in combination with others, as I chose to do at the beginning of the piece, then given the market’s current negative trend, you will be in treasury bills right now–especially if you are using the daily version of the strategy, which is advisable, since the daily version shrinks the strategy’s gap losses at essentially no cost.  Bear in mind that if strength in the domestic economy wins out in the current tug of war, then the strategy on this construction is likely to get whipsawed.

As with any market timing strategy, GTT carries risks.  There’s a sense in which it’s been data-mined–shaped to conform to the observed historical patterns that have made recession timing profitable.  Of course, the construction of any timing strategy will involve an element of data-mining, if the strategy is going to get the calls right.  The designer will have to look back at the past and see how the system worked, modeling the strategy to exploit the recurrent patterns that it produced.  But still, those patterns may have been coincidental, and therefore the success of the approach may not persist reliably into the future.

Equally importantly, in the past, investors were not as attuned to the predictive power of the economic readings that the strategy uses.  The market’s increased general awareness of the predictive power of those readings may hinder the performance of the strategy going forward.  To give an example of how the hindrance might happen, volatility could increase around the public release of the readings, as everyone focuses on them.  The volatility would then inflict whipsaw losses on the strategy as it tries to trade on the readings.

What GTT has in its favor, in contrast to other popular trend-following approaches, is that it is highly efficient in its trading and extremely long-biased in its market disposition (spending roughly 87% of the time invested).  Even if, by chance, the future turns out to be substantially different from the past, such that the strategy gets certain timing calls wrong, investors in the strategy are unlikely to spend large amounts of time camped outside of the market.  That’s the biggest risk associated with market timing–that the investor will never find a way to get back in, and will therefore miss out on a lifetime’s worth of market earnings.  By timing the market on a set of signals that flash negative warnings only very rarely, GTT mitigates that risk.

Most importantly, GTT makes sense analytically.  It systematically does what any human trend-following market timer would have to do in order to be successful–distinguish between negative price trends that will give way to large downturns that support profitable exits and reentries, and negative price trends that will prove to be nothing more than short-term noise, head-fakes that quickly reverse, inflicting whipsaw losses on whoever tries to trade them.  My reason for introducing the strategy is not so much to tout its efficacy, but to articulate that task as the primary task of trend-following, a task that every trend-follower should be laser-focused on, in the context of the current negative trend, and all future ones.

Posted in Uncategorized | Comments Off on Growth and Trend: A Simple, Powerful Technique for Timing the Stock Market

Trend Following In Financial Markets: A Comprehensive Backtest

bill griffeth

Bill Griffeth interviews Paul Tudor Jones, Black Monday, October 19th, 1987.

“My metric for everything I look at is the 200-day moving average of closing prices.  I’ve seen too many things go to zero, stocks and commodities.  The whole trick in investing is: ‘How do I keep from losing everything?’  If you use the 200-day moving average rule, then you get out.  You play defense, and you get out.” — Paul Tudor Jones, as interviewed by Tony Robbins in Money: Master the Game.


Everyone agrees that it’s appropriate to divide the space of a portfolio between different asset classes–to put, for example, 60% of a portfolio’s space into equities, and 40% of its space into fixed income.  “Market Timing” does the same thing, except with time.  It divides the time of a portfolio between different asset classes, in an effort to take advantage of the times in which those asset classes tend to produce the highest returns.

What’s so controversial about the idea of splitting the time of a portfolio between different asset classes, as we might do with a portfolio’s space?  Why do the respected experts on investing almost unanimously discourage it?

  • The reason can’t be transaction costs.  Those costs have been whittled down to almost nothing over the years.
  • The reason can’t be negative tax consequences.  An investor can largely avoid those consequences through the use of futures contracts.  Suppose, for example, that an investor owns shares of an S&P 500 ETF as a core long-term position in a taxable account, and wants to temporarily go to cash in advance of some expected period of market turbulence.  To do that, she need not sell the shares themselves.  Instead, she can sell an S&P 500 futures contract in an amount equal to the size of the ETF position. The sale will perfectly offset her exposure to the S&P 500, bringing it down to exactly zero, without triggering a taxable capital gain.  When she wants to re-enter the market, she can simply buy back the futures contract, removing the hedge.  The only negative tax implication is that during the period in which she holds the hedge, her position will count as a section 1092 “straddle”, and any qualified dividends that she receives will be taxed as ordinary income.  But that’s a very small impact, especially if the hedged period is brief.
  • The reason can’t be poor timing.  For if markets are efficient, as opponents of market timing argue, then it shouldn’t possible for an individual to time the market “poorly.”  As a rule, any choice of when to exit the market should be just as good as any other.  If a person were able to consistently defy that rule, then reliably beating the market would be as simple as building a strategy to do the exact opposite of what that person does.

In my view, the reason that market timing is so heavily discouraged is two-fold:

(1) Market timing requires big choices, and big choices create big stress, especially when so much is at stake.  Large amounts of stress usually lead to poor outcomes, not only in investing, but in everything.

(2) The most vocal practitioners of market timing tend to perform poorly as investors.

Looking at (2) specifically, why do the most vocal practitioners of market timing tend to perform poorly as investors? The answer is not that they are poor market timers per se. Rather, the answer is that they tend to always be underinvested.  By nature, they’re usually more risk-averse to begin with, which is what sends them down the path of identifying problems in the market and stepping aside.  Once they do step aside, they find it difficult to get back in, especially when the market has gone substantially against them.  It’s painful to sell something and then buy it back at a higher price, locking in a loss.  It’s even more difficult to admit that the loss was the result of one’s being wrong.  And so instead of doing that, the practitioners entrench.  They come up with reasons to stay on the course they’re on–a course that ends up producing a highly unattractive investment outcome.

To return to our space-time analogy, if an investor were to allocate 5% of the space of her portfolio to equities, and 95% to cash, her long-term performance would end up being awful.  The reason would be clear–she isn’t taking risk, and if you don’t take risk, you don’t make money.  But notice that we wouldn’t use her underperformance to discredit the concept of “diversification” itself, the idea that dividing the space of a portfolio between different asset classes might improve the quality of returns.  We wouldn’t say that people that allocate 60/40 or 80/20 are doing things wrong.  They’re fine.  The problem is not in the concept of what they’re doing, but in her specific implementation of it.

Well, the same point extends to market timing.  If a vocal practitioner of market timing ends up spending 5% of his time in equities, and 95% in cash, because he got out of the market and never managed to get back in, we shouldn’t use his predictably awful performance to discredit the concept of “market timing” itself, the idea that dividing a portfolio’s time between different asset classes might improve returns.  We shouldn’t conclude that investors that run market timing strategies that stay invested most of the time are doing things wrong.  The problem is not in the concept of what they’re doing, but in his specific implementation of it.

In my view, the practice of market timing, when done correctly, can add meaningful value to an investment process, especially in an expensive market environment like our own, where the projected future returns to a diversified buy and hold strategy are so low.  The question is, what’s the correct way to do market timing?  That’s the question that I’m going to try to tackle in the next few pieces.

In the current piece, I’m going to conduct a comprehensive backtest of three popular trend-following market timing strategies: the moving average strategy, the moving average crossover strategy, and the momentum strategy.  These are simple, binary market timing strategies that go long or that go to cash at the close of each month based on the market’s position relative to trend.  They produce very similar results, so after reviewing their performances in U.S. data, I’m going to settle on the moving average strategy as a representative strategy to backtest out-of-sample.

The backtest will cover roughly 235 different equity, fixed income, currency, and commodity indices, and roughly 120 different individual large company stocks (e.g., Apple, Berkshire Hathaway, Exxon Mobil, Procter and Gamble, and so on).  For each backtest, I’m going to present the results in a chart and an entry-exit table, of the type shown below (10 Month Moving Average Strategy, Aggregate Stock Market Index of Sweden, January 1920 – July 2015):


(For a precise definition of each term in the chart and table, click here.)

The purpose of the entry-exit tables is to put a “magnifying glass” on the strategy, to give a close-up view of what’s happening underneath the surface, at the level of each individual trade.  In examining investment strategies at that level, we gain a deeper, more complete understanding of how they work.  In addition to being gratifying in itself, such an understanding can help us more effectively implement the concepts behind the strategies.

I’ve written code that allows me to generate the charts and tables very quickly, so if readers would like to see how the strategy performs in a specific country or security that wasn’t included in the test, I encourage them to send in the data.  All that’s needed is a total return index or a price index with dividend amounts and payment dates.

The backtest will reveal an unexpected result: that the strategy works very well on aggregate indices–e.g., the S&P 500, the FTSE, the Nikkei, etc.–but works very poorly on individual securities.  For perspective on the divergence, consider the following chart and table of the strategy’s performance (blue line) in the S&P 500 from February of 1963 to July of 2015:

spx1962 spx1962a

As you can see, the strategy performs very well, exceeding the return of buy and hold by over 60 bps per year, with lower volatility and roughly half the maximum drawdown. Compare that performance with the strategy’s performance in the six largest S&P 500 stocks that have trading histories back to 1963.  Ordered by market cap, they are: General Electric $GE, Walt Disney $DIS, Coca-Cola $KO, International Business Machines $IBM, Dupont $DD and Caterpillar $CAT.

(Note: click on any image, and a high-resolution slideshow of all the images will appear)

(For a precise definition of each term in the charts and tables, click here.)

As you can see in the charts, the strategy substantially underperforms buy and hold in every stock except $GE.  The pattern is not limited to these 6 cases–it extends out to the vast majority of stocks in the S&P 500.  The strategy performs poorly in almost all of them, despite performing very well in the index.

The fact that the strategy performs poorly in individual securities is a significant problem, as it represents a failed out-of-sample test that should not occur if popular explanations for the strategy’s efficacy are correct.  The most common explanations found in the academic literature involve appeals to the behavioral phenomena of underreaction and overreaction.  Investors allegedly underreact to new information as it’s introduced, and then overreact to it after it’s been in the system for an extended period of time, creating price patterns that trend-following strategies can exploit.  But if the phenomena of underreaction and overreaction explain the strategy’s success, then why isn’t the strategy successful in individual securities?  Individual securities see plenty of underreaction and overreaction as new information about them is released and as their earnings cycles progress.

There’s a reason why the strategy fails in individual securities, and it’s quite fascinating. In the next piece, I’m going to try to explain it.  I’m also going to try to use it to build a substantially improved version of the strategy.  For now, my focus will simply be on presenting the results of the backtest, so that readers can come to their own conclusions.

Market Timing: Definitions and Restrictions

We begin with the following definitions:

Risk Asset: An asset that exhibits meaningful price volatility.  Examples include: equities, real estate, collectibles, foreign currencies expressed in one’s own currency, and long-term government bonds.  Note that I insert this last example intentionally. Long-term government bonds exhibit substantial price volatility, and are therefore risk assets, at least on the current definition.  

Safe Asset: An asset that does not exhibit meaningful price volatility.  There is only one truly safe asset: “cash” in base currency.  For a U.S. investor, that would include: paper dollars and Fed balances (base money), demand and time deposits at FDIC insured banks, and short-term treasury bills.

Market Timing Strategy: An investment strategy that seeks to add to the performance of a Risk Asset by switching between exposure to that asset and exposure to a Safe Asset.

The market timing strategies that we’re going to examine in the current study will be strictly binary.  At any given time, they will either be entirely invested in a single risk asset, or entirely invested in a single safe asset, with both specified beforehand.  There will be no degrees of exposure–only 100% or 0%.

From a testing perspective, the advantage of a binary strategy is that the ultimate sources of the strategy’s performance–the individual switching events–can be analyzed directly. When a strategy alters its exposure in degrees, such an analysis becomes substantially more difficult–every month in the series becomes a tiny “switch.”

The disadvantage of a binary strategy is that the strategy’s performance will sometimes end up hinging on the outcome of a small number of very impactful switches.  In such cases, it will be easier for the strategy to succeed on luck alone–the luck of happening to switch at just the right time, just as the big “crash” event is starting to begin, when the switch itself was not an expression of any kind of reliable skill at avoiding the event.

To be clear, this risk also exists in “degreed” market timing strategies, albeit to a reduced degree.  Their outperformance will sometimes result from rapid moves that they make in the early stages of market downturns, when the portfolio moves can just as easily be explained by luck as by genuine skill at avoiding downturns.

In our case, we’re going to manage the risk in two ways: (1) by conducting out of sample testing on a very large, diverse quantity of independent data sets, reducing the odds that consistent outperformance could have been the result of luck, and (2) by conducting tweak tests on the strategy–manipulations of obviously irrelevant details in the strategy, to ensure that the details are not driving the results.

In additiong to being binary, the market timing strategies that we’re going to test will only switch into cash as the safe asset.  They will not switch into other proxies for safety, such as long-term government bonds or gold.  When a strategy switches from one volatile asset (such as equities) into another volatile asset that is standing in as the safe asset (e.g., long-term government bonds or gold), the dominant source of the strategy’s performance ends up being obscured by the possibility that a favorable or unfavorable timing of either or both assets, or alternatively, a strong independent performance from the stand-in safe asset, could be the source. Both assets, after all, are fluctuating in price.

An example will help illustrate the point.  Suppose that we’re examining a market timing strategy that is either all stocks or all cash.  If the strategy outperforms, we will know that it is outperforming by favorably timing the price movements of stocks and stocks alone. There’s nothing else for it to favorably time.  It cannot favorably time the price movements of cash, for example, because cash is a safe asset whose “price” exhibits no movement. Suppose, alternatively, that we’re examining a strategy that switches from stocks into bonds.  If the strategy outperforms, we will not know whether it outperformed by favorably timing stocks, by favorably timing bonds, or by favorably timing both.  Similarly, we won’t know how much of its strength was a lucky byproduct of strength in bonds as an asset class.  We therefore won’t know how much was a direct consequence of skill in the timing of stocks, which is what we’re trying to measure.

Now, to be clear, adding enhancements to a market timing strategy–degreed exposure, a higher-returning uncorrelated safe asset to switch into (e.g., long-term government bonds), leverage, and so on–can certainly improve performance.  But the time to add them is after we’ve successfully tested the core performance a strategy, after we’ve confirmed that the strategy exhibits timing skill.  If we add them before we’ve successfully tested the core performance of the strategy, before we’ve verified that the strategy exhibits timing skill, the risk is that we’re going to introduce noise into the test that will undermine the sensitivity and specificity of the result.

Optimizing the Risk-Reward Proposition of Market Timing

principlesThe right way to think about market timing is in terms of risk and reward.  Any given market timing strategy will carry a certain risk of bad outcomes, and a certain potential for good outcomes.  We can increase the probability of seeing good outcomes, and reduce the probability of seeing bad outcomes, by seeking out strategies that manifest the following five qualities: analytic, generic, efficient, long-biased, and recently-successful.

I will explain each in turn:

Analytic:  We want market timing strategies that have a sound analytic basis, whose efficacy can be shown to follow from an analysis of known facts or reasonable assumptions about a system, an analysis that we ourselves understand.  These properties are beneficial for the following reasons:

(1) When a strategy with a sound analytic basis succeeds in testing, the success is more likely to have resulted from the capturing of real, recurrent processes in the data, as opposed to the exploitation coincidences that the designer has stumbled upon through trial-and-error. Strategies that succeed by exploiting coincidences will inevitably fail in real world applications, when the coincidences get shuffled around.

(2) When we understand the analytic basis for a strategy’s success, we are better able to assess the risk that the success will fail to carry over into real-world applications. That risk is simply the risk that the facts or assumptions that ground the strategy will turn out to be incorrect or incomplete.  Similarly, we are better able to assess the risk that the success will decay or deteriorate over time.  That risk is simply the risk that conditions relevant to the facts or assumptions will change in relevant ways over time.

To illustrate, suppose that I’ve discovered a short-term trading strategy that appears to work well in historical data.  Suppose further that I’ve studied the issue and am able to show why the strategy works, given certain known specific facts about the behaviors of other investors, with the help of a set of reasonable simplifying assumptions.  To determine the risk that the strategy’s success will fail to carry over into real-world applications, I need only look at the facts and assumptions and ask, what is the likelihood that they are in some way wrong, or that my analysis of their implications is somehow mistaken?  Similarly, to determine the risk that the success will decay or deteriorate over time, I need only ask, what is the likelihood that conditions relevant to the facts and assumptions might change in relevant ways?  How easy would it be for that to happen?

If I don’t understand the analytic basis for a strategy’s efficacy, I can’t do any of that. The best I can do is cross my fingers and hope that the observed past success will show up when I put real money to work in the strategy, and that it will keep showing up going forward.  If that hope doesn’t come true, if the strategy disappoints or experiences a cold spell, there won’t be any place that I can look, anywhere that I can check, to see where I might have gone wrong, or what might have changed.  My ability to stick with the strategy, and to modify it as needed in response to changes, will be significantly curtailed.

(3) An understanding of the analytic basis for a strategy’s efficacy sets boundaries on a number of other important requirements that we need to impose.  We say, for example, that a strategy should succeed in out-of-sample testing.  But some out-of-sample tests do not apply, because they do not embody the conditions that the strategy needs in order to work.  If we don’t know how the strategy works in the first place, then we have no way to know which tests those are.

To offer an example, the moving average, moving average crossover, and momentum strategies all fail miserably in out-of-sample testing in individual stocks. Should the strategies have performed well in that testing, given our understanding of how they work, how they generate outperformance?  If we don’t have an understanding of how they work, how they generate outperformance, then we obviously can’t answer the question.

Now, many would claim that it’s unreasonable to demand a complete analytic understanding of the factors behind a strategy’s efficacy.  Fair enough.  I’m simply describing what we want, not what we absolutely have to have in order to profitably implement a strategy.  If we can’t get what we want, in terms of a solid understanding of why a strategy works, then we have to settle for the next best thing, which is to take the strategy live, ideally with small amounts of money, and let the consequences dictate the rest of the story.  If the strategy works, and continues to work, and continues to work, and continues to work, and so on, then we stick with it.  When it stops working for a period of time that exceeds our threshold of patience, we abandon it.  I acknowledge that this is is a perfectly legitimate empirical approach that many traders and investors have been able to use to good effect.  It’s just a difficult and sometimes costly approach to use, particularly in situations where the time horizon is extended and where it takes a long time for the “results” to come in.

The point I’m trying to make, then, is not that the successful implementation of a strategy necessarily requires a strong analytic understanding of the strategy’s mechanism, but that such an understanding is highly valuable, worth the cost of digging to find it.  We should not just cross our fingers and hope that past patterning will repeat itself.  We should dig to understand.


In the early 1890s, when the brilliant physicist Oliver Heaviside discovered his operator method for solving differential equations, the mathematics establishment dismissed it, since he couldn’t give an analytic proof for its correctness.  All he could do was put it to use in practice, and show that it worked, which was not enough.  To his critics, he famously retorted:

“I do not refuse my dinner simply because I do not understand the process of digestion.”

His point is relevant here.  The fact that we don’t have a complete analytic understanding of a strategy’s efficacy doesn’t mean that the strategy can’t be put to profitable use.  But there’s an important difference between Heaviside’s case and the case of someone who discovers a strategy that succeeds in backtesting for unknown reasons.  If you make up a brand new differential equation that meets the necessary structure, an equation that Heaviside has never used his method on, that won’t be an obstacle for him–he will be able to use the method to solve it, right in front of your eyes.  Make up another one, he’ll solve that one. And so on. Obviously, the same sort of on-demand demonstration is not possible in the context of a market timing strategy that someone has discovered to work in past data.  All that the person can do is point to backtesting in that same stale data, or some other set of data that is likely to have high correlations to it.  That doesn’t count for very much, and shouldn’t.

Generic:  We want market timing strategies that are generic.  Generic strategies are less likely to achieve false success by “overfitting” the data–i.e., “shaping” themselves to exploit coincidences in the data that are not going to reliably recur.

An example of a generic strategy would be the instruction contained in a simple time series momentum strategy: to switch into and out of the market based on the market’s trailing one year returns.  If the market’s trailing one year returns are positive, go long, if they’re negative, go to cash or go short.  Notice that one year is a generic whole number. Positive versus negative is a generic delineation between good and bad.  The choice of these generic breakpoints does not suggest after-the-fact overfitting.

An example of the opposite of a generic strategy would be the instruction to be invested in the market if some highly refined condition is met: for example, if trailing one year real GDP growth is above 2.137934%, or if the CAPE is less than 17.39, or if bullish investor sentiment is below 21%.  Why were these specific breakpoints chosen, when so many others were possible?  Is the answer that the chosen breakpoints, with their high levels of specificity, just-so-happen to substantially strengthen the strategy’s performance in the historical data that the designer is building it in?  A yes answer increases the likelihood that the performance will unravel when the strategy is taken into the real-world.

A useful test to determine whether a well-performing strategy is sufficiently generic is the “tweak” test.  If we tweak the rules of the strategy in ways that should not appreciably affect its performance, does its performance appreciably suffer?  If the answer is yes, then the strength of the performance is more likely to be specious.

Efficient:  Switching into and out of the market inflicts gap losses, slip losses, transaction costs, and, if conducted unskillfully, realized tax liabilities, each of which represents a guaranteed negative hit to performance.  However, the positive benefits of switching–the generation of outperformance through the avoidance of drawdowns–are not guaranteed. When a strategy breaks, the positive benefits go away, and the guaranteed negative hits become our downside.  They can add up very quickly, which is why we want market timing strategies that switch efficiently, only when the probabilities of success in switching are high.

Long-Biased:  Over long investment horizons, risk assets–in particular, equities–have a strong track record of outperforming safe assets.  They’ve tended to dramatically overcompensate investors for the risks they’ve imposed, generating total returns that have been significantly higher than would have been necessary to make those risks worth taking.  As investors seeking to time the market, we need to respect that track record.  We need to seek out strategies that are “long-biased”, i.e., strategies that maximize their exposure to risk assets, with equities at the top of the list, and minimize their exposure to safe assets, with cash at the bottom of the list.

Psychologists tell us that, in life, “rational optimists” tend to be the most successful.  We can probably extend the point to market timing strategies.  The best strategies are those that are rationally optimistic, that default to constructive, long positions, and that are willing to shift to other positions, but only when the evidence clearly justifies it.

Recently Successful: When all else is equal, we should prefer market timing strategies that test well in recent data and that would have performed well in recent periods of history. Those are the periods whose conditions are the most likely to share commonalities with current conditions, which are the conditions that our timing strategies will have to perform in.

Some would perjoratively characterize our preference for success in recent data as a “This Time is Different” approach–allegedly the four most dangerous words in finance.  The best way to come back at this overused cliché is with the words of Josh Brown: “Not only is this time different, every time is different.”  With respect to testing, the goal is to minimize the differences.  In practice, the way to do that is to favor recent data in the testing. Patterns that are found in recent data are more likely to have arisen out of causes that are still in the system.  Such patterns are therefore more likely to arise again.

The Moving Average Strategy

In 2006, Mebane Faber of Cambria Investments published an important white paper in which he introduced a new strategy for implementing the time-honored practice of trend-following.  His proposed strategy is captured in the following steps:

(1) For a given risk asset, at the end of each month, check the closing price of the asset.

(2) If the closing price is above the average of the 10 prior monthly closes, then go long the asset and stay long through the next month.

(3) If the price is below the average of the 10 prior monthly closes, then go to cash, or to some preferred proxy for a safe asset, and stay there through the next month.

Notably, this simple strategy, if implemented when Faber proposed it, would have gone on to protect investors from a 50% crash that began a year later.  After protecting investors from that crash, the strategy would have placed investors back into long positions in the summer of 2009, just in time to capture the majority of the rebound.  It’s hard to think of many human market timers that managed to perform better, playing both sides of the fence in the way that the strategy was able to do.  It deserves respect.

To make the strategy cleaner, I would offer the following modification: that the strategy switch based on total return rather than price. When the strategy switches based on total return, it puts all security types on an equal footing: those whose prices naturally move up over time due to the retention of income (e.g., growth equities), and those that do not retain income and whose prices therefore cannot sustainably move upwards (e.g., high-yield bonds).

Replacing price with total return, we arrive at the following strategy:

(1) For a given risk asset, at the end of each month, check the closing level of the asset’s total return index.  (Note: you can quickly derive a total return index from a price index by subtracting, from each price in the index, the cumulative dividends that were paid after the date of that price.)

(2) If the closing level of the total return index is above the average of the 10 prior monthly closing levels, then go long the asset and stay long through the next month.

(3) If the closing level of the total return index is below the average of the 10 prior monthly closing levels, then go to cash, or to some preferred proxy for a safe asset, and stay there through the next month.

We will call this strategy MMA, which stands for Monthly Moving Average strategy. The following chart shows the performance of MMA in the S&P 500 from February of 1928 to November of 2015.  Note that we impose a 0.6% slip loss on each round-trip transaction, which was the average bid-ask spread for large company stocks in the 1928 – 2015 period:


(For a precise definition of each term in the chart, click here.)

The blue line in the chart is the total return of MMA.  The gray line is the total return of a strategy that buys and holds the risk asset, abbreviated RISK.  In this case, RISK is the S&P 500.  The black line on top of the gray line, which is difficult to see in the current chart, but which will be easier to see in future charts, is the moving average line.  The yellow line is the total return of a strategy that buys and holds the safe asset, abbreviated SAFE.  In this case, SAFE is the three month treasury bill, rolled over on a monthly basis. The purple line is the opposite of MMA–a strategy that is out when MMA is in, and in when MMA is out.  It’s abbreviated ANTI.  The gray columns are U.S. recession dates.

The dotted green line shows the timing strategy’s cumulative outperformance over the risk asset, defined as the ratio of the trailing total return of the timing strategy to the trailing total return of a strategy that buys and holds the risk asset.  It takes its measurement off of the right y-axis, with 1.0 representing equal performance.  When the line is ratcheting up to higher numbers over time, the strategy is performing well.  When the line is decaying down to lower numbers over time, the strategy is performing poorly.

We can infer the strategy’s outperformance over any two points in time by examing what happens to the green line.  If the green line ends up at a higher place, then the strategy outperformed.  If it ends up at a lower place, then the strategy underperformed.  As you can see, the strategy dramatically outperformed from the late 1920s to the trough of the Great Depression (the huge spike at the beginning of the chart).  It then underperformed from the 1930s all the way through to the late 1960s.  From that point to now, it’s roughly equal performed, enjoying large periods of outperformance during market crashes, offset by periods of underperformance during the subsequent rebounds, and a long swath of underperformance during the 1990s.

Now, it’s not entirely fair to be evaluating the timing strategy’s performance against the performance of the risk asset.  The timing strategy spends a significant portion of its time invested in the safe asset, which has a lower return, and a lower risk, than the risk asset. We should therefore expect the timing strategy to produce a lower return, with a lower risk, even when the timing strategy is improving the overall performance.

The appropriate way to measure the performance of the timing strategy is through the use of what I call the “X/Y portfolio”, represented by the red line in the chart.  The X/Y portfolio is a mixed portfolio with an allocation to the risk asset and the safe asset that matches the timing strategy’s cumulative ex-post exposure to each asset.  In the present case, the timing strategy spends roughly 72% of its time in the risk asset, and roughly 28% of its time in the safe asset.  The corresponding X/Y portfolio is then a 72/28 risk/safe portfolio, a portfolio continually rebalanced to hold 72% of its assets in the S&P 500, and 28% of its assets in treasury bills.

If a timing strategy were to add exactly zero value through its timing, then its performance–its return and risk–would be expected to match the performance of the corresponding X/Y portfolio.  The performances of the two strategies would be expected to match because their cumulative asset exposures would be identical–the only difference would be in the specific timing of the exposures.  If a timing strategy can consistently produce a better return than the corresponding X/Y portfolio, with less risk, then it’s necessarily adding value through its timing.  It’s taking the same asset exposures and transforming them into “something more.”

When looking at the charts, then, the way to assess the strategy’s skill in timing is to compare the blue line and the red line.  If the blue line is substantially above the red line, then the strategy is adding positive value and is demonstrating positive skill.  If the blue line equals the red line to within a reasonable statistical error, then the strategy is adding zero value and is demonstrating no skill–the performance equivalent of randomness.  If the blue line is substantially below the red line, then the strategy is adding negative value and is demonstrating negative skill.

The following table shows the entry-exit dates associated with the previous chart:


(For a precise definition of each term in the table, click here.)

Each entry-exit pair (a sale followed by a purchase) produces a relative gain or loss on the index.  That relative gain or loss is shown in the boxes in the “Gain” column, which are shaded in green for gains and in red for losses.  You can quickly look at the table and evaluate the frequency of gains and losses by gauging the frequency of green and the red.

What the table is telling is that the strategy makes the majority of its money by avoiding large, sustained market downturns.  To be able to avoid those downturns, it has to accept a large number of small losses associated with switches that prove to be unnecessary. Numerically, more than 75% of all of MMA’s trades turn out to be losing trades. But there’s a significant payout asymmetry to each trade: the average winning trade produces a relative gain of 26.5% on the index, whereas the average losing trade only inflicts a relative loss of -6.0%.

Comparing the Results: Two Additional Strategies

In addition to Faber’s strategy, two additional trend-following strategies worth considering are the moving average crossover strategy and the momentum strategy.  The moving average crossover strategy works in the same way as the moving average strategy, except that instead of comparing the current value of the price or total return to a moving average, it compares a short horizon moving average to a long horizon moving average. When the short horizon moving average crosses above the long horizon moving average, a “golden cross” occurs, and the strategy goes long.  When the short horizon moving average crosses below the long horizon moving average, a “death cross” occurs, and the strategy exits.  The momentum strategy also works in the same way as the moving average strategy, except that instead of comparing the current value of the price or total return to a moving average, it compares the current value to a single prior value–usually the value from 12 months ago.

The following table shows the U.S. equity performance of Faber’s version of the moving average strategy (MMA-P), our proposed total return modification (MMA-TR), the moving crossover strategy (CROSS), and the momentum strategy (MOMO) across a range of possible moving average and momentum periods:


If you closely examine the table, you will see that MMA-TR, MMA-P, and MOMO are essentially identical in their performances.  The performance of CROSS diverges negatively in certain places, but the comparison is somewhat artificial, given that there’s no way to put CROSS’s two moving average periods onto the same basis as the single periods of the other strategies.

Despite similar performances in U.S. equities, we favor MMA-TR over MMA-P because MMA-TR is intuitively cleaner, particular in the fixed income space. In that space, MMA-P diverges from the rest of the strategies, for the obvious reason that fixed income securities do not retain earnings and therefore do not show an upward trend in their prices over time.  MMA-TR is also easier to backtest than MMA-P–only one index, a total return index, is needed. For MMA-P, we need two indices–a price index that decides the switching, and a total return index that calculates the returns.

We favor MMA-TR over MOMO for a similar reason.  It’s intuitively cleaner than MOMO, since it compares the current total return level to an average of prior levels, rather than a single prior level.  A strategy that makes comparisons to a single prior level is vulnerable to single-point anomalies in the data, whereas a strategy that makes comparison to an average of prior levels will smooth those anomalies out.

We’re therefore going to select MMA-TR to be the representative trend-following strategy that we backtest out-of-sample.  Any conclusions that we reach will extend to all of the strategies–particularly MMA-P and MOMO, since their structures and performances are nearly identical to that of MMA-TR.  We’re going to use 10 months as the moving average period, but not because 10 months is special.  We’re going to use it because it’s the period that Faber used in his original paper, and because it’s the period that just-so-happens to produce the best results in U.S. equities.

Changing Moving Average Periods: A Tweak Test

Settling on a 10 month moving average period gives us our first opportunity to apply the “tweak” test.  With respect to the chosen moving average period, what makes 10 months special?  Why not use a different number: say, 6, 7, 8, 9, 11, 15, 20, 200 and so on?  The number 10 is ultimately arbitrary, and therefore the success of the strategy should not depend on it.

Fortunately, when we apply a reasonable range of numbers other than 10 to the strategy, we obtain similarly positive results, in satisfaction of the “tweak” test.  The following table shows the performance of the strategy under moving average periods ranging from 1 month to 300 months, with the performance of 10 months highlighted in yellow:


Evidently, the strategy works well for all moving average periods ranging from around 5 months to around 50 months.  When periods below around 5 months are used, the strategy ends up engaging in excessive unnecessary switching.  When periods greater than around 50 months are used, the moving average ends up lagging the index by such a large amount that it’s no longer able to switch when it needs to, in response to valid signs of impending downtrends.

The following two charts illustrate the point.  In the first chart, a 1 month period is used. The strategy ends up switching in roughly 46% of all months–an egregiously high percentage that indicates significant inefficiency.  In the second chart, a 300 month period is used.  The strategy ends up completely impotent–it never switches, not even a single time.

1month 300month

(For a precise definition of each term in the chart, click here.)

Evaluating the Strategy: Five Desired Qualities

Earlier, we identified five qualities that we wanted to see in market timing strategies.  They were: analytic, generic, efficient, long-biased, and recently-successful.  How does MMA far on those qualities?  Let’s examine each individually.

Here, again, are the chart and table for the strategy’s performance in U.S. equities:



(For a precise definition of each term in the chart and table, click here.)

Here are the qualities, laid out with grades:

Analytic?  Undecided.  Advocates of the strategy have offered behavioral explanations for its efficacy, but those explanations leave out the details, and will be cast into doubt by the results of the testing that we’re about to do.  Note that in the next piece, we’re going to give an extremely rigorous account of the strategy’s functionality, an account that will hopefully make all aspect of its observed performance–its successes and its failures–clear.

Generic?  Check.  We can vary the moving average period anywhere from 5 to 50 months, and the strategy retains its outperformance over buy and hold.  Coincidences associated with the number 10 are not being used as a lucky crutch.

Efficient?  Undecided.  The strategy switches in 10% of all months.  On some interpretations, that might be too much.  The strategy has a switching win rate of around 25%, indicating that the majority of the switches–75%–are unnecessary and harmful to returns.  But, as the table confirms, the winners tend to be much bigger than the losers, by enough to offset them in the final analysis.  We can’t really say, then, that the strategy is inefficient.  We leave the verdict at undecided.

Long-Biased?  Check.  The strategy spends 72% of its time in equities, and 28% of its time in cash, a healthy ratio.  The strategy is able to maintain a long-bias because the market has a persistent upward total return trend over time, a trend that causes the total return index to spend far more time above the trailing moving average than below.

On a related note, the strategy has a beneficial propensity to self-correct.  When it makes an incorrect call, the incorrectness of the call causes it to be on the wrong side of the total return trend.  It’s then forced to get back on the right side of the total return trend, reversing the mistake.  This propensity comes at a cost, but it’s beneficial in that prevents the strategy from languishing in error for extended periods of time. Other market timing approaches, such as approaches that try to time on valuation, do not exhibit the same built-in tendency.  When they get calls wrong–for example, when they wrongly estimate the market’s correct valuation–nothing forces them to undo those calls.  They get no feedback from the reality of their own performances.  As a consequence, they have the potential to spend inordinately long periods of time–sometimes decades or longer–stuck out of the market, earning paltry returns.

Recently Successful?  Check.  The strategy has outperformed, on net, since the 1960s.

Cowles Commission Data: Highlighting a Key Testing Risk

Using data compiled by the Cowles Commission, we can conduct our first out-of-sample test on the strategy.  The following chart shows the strategy’s performance in U.S. equities back to the early 1870s.  We find that the strategy performs extremely well, beating the X/Y portfolio by 210 bps, with a substantially lower drawdown.


The strong performance, however, is the consequence of a hidden mistake.  The Cowles Commission prices that are available for U.S. equities before 1927 are not closing prices, but averages of high and low prices for the month.  In allowing ourselves to transact at those prices, we’re effectively cheating.


The point is complicated, so let me explain.  When the index falls below the moving average, and we sell at the end of the month at the quoted Cowles monthly price, we’re essentially letting ourselves sell at the average price for that month, a price that’s no longer available, and that’s likely to be higher than the currently available price, given the downward price trend that we’re acting on.  The same holds true in reverse.  When the index moves above the average, and we buy in at the end of the month, we’re essentially letting ourselves buy in at the average price for the month, a price that’s no longer available, and that’s likely to be lower than the closing price, given the upward price trend that we’re acting on.  So, in effect, whenever we sell and buy in this way, we’re letting ourselves sell higher, and buy lower, than would have been possible in real life.

To use the Cowles Commission data and not cheat, we need to insert a 1 month lag into the timing.  If, at the end of a month, the strategy tells us to sell, we can’t let ourselves go back and sell at the average price for that month.  Instead, we have take the entirety of the next month to sell, selling a little bit on each day.  That’s the only way, in practice, to sell at an “average” monthly price.  Taking this approach, we get a more truthful result.  The strategy still outperforms, but by an amount that is more reasonable:


(For a precise definition of each term in the chart and table, click here.)

To avoid this kind of inadvertent cheating in our backtests, we have to make extra sure that the prices in any index that we test our strategies on are closing monthly prices. If an index is in any way put together through the use of averaging of different prices in the month–and some indices are put together that way, particularly older indices–then a test of the moving average strategy, and of all trend-following strategies more generally, will produce inaccurate, overly-optimistic results.

The Results: MMA Tested in 235 Indices and 120 Individual Securities

We’re now ready for the results.  I’ve divided them in into eleven categories: U.S. Equities, U.S. Factors, U.S. Industries, U.S. Sectors, Foreign Equities in U.S. Dollar Terms, Foreign Equities in Local Currency Terms, Global Currencies, Fixed Income, Commodities, S&P 500 Names, and Bubble Roadkill Names.

In each test, our focus will be on three performance measures: Annual Total Return (reward measure), Maximum Drawdown (risk measure), and the Sortino Ratio (reward-to-risk measure).  We’re going to evaluate the strategy against the X/Y portfolio on each of these measures.  If the strategy is adding genuine value through its timing, our expectation is that it will outperform on all of them.

For the three performance measures, we’re going to judge the strategy on its win percentage and its excess contribution.  The term “win percentage” refers to the percentage of individual backtests in a category that the strategy outperforms on.  We expect strong strategies to post win percentages above 50%.  The terms “excess annual return”, “excess drawdown”, and “excess Sortino” refer to the raw numerical amounts that the strategy increases those measures by, relative to the X/Y portfolio and fully invested buy and hold.  So, for example, if the strategy improves total return from 8% to 9%, improves drawdown from -50% to -25%, and increases the Sortino Ratio from 0.755 to 1.000, the excess annual return will be 1%, the excess drawdown will be +25%, and the excess Sortino will be 0.245.  We will calculate the excess contribution of the strategy for a group of indices by averaging the excess contributions of each index in the group.

The Sortino Ratio, which will turn out to be the same number for both the X/Y portfolio and a fully invested buy and hold portfolio, will serve as the final arbiter of performance. If a strategy conclusively outperforms on the Sortino Ratio–meaning that it delivers both a positive excess Sortino Ratio, and a win percentage on the Sortino Ratio that is greater than 50%–then we will deliver a verdict of “Outperform.”  Otherwise, we will deliver a verdict of “Underperform.”

Now, to the results:

(Note: if you have questions on how to read the charts and tables, or on how terms are defined conceptually or mathematically, click here for a guide.)

U.S. Equities, 1871 – 2015: The strategy was tested in U.S. equities across different date ranges and under different choices of safe assets (treasury bills, 10 year treasury notes, investment-grade corporate bonds, and gold). Verdict: Outperform.  Click here and scroll down to see a slideshow of the charts and tables.

U.S. Size, Momentum, and Value Factor Indices, 1928 – 2015: The strategy was tested in 30 different U.S. factor indices–size, momentum, and value, each separated into 10 decile indices.  Verdict: Outperform.  Click here and scroll down to see a slideshow of the charts and tables.

30 U.S. Industries, 1928 – 2015: The strategy was tested in 30 different U.S. industry indices.  Verdict: Outperform.  Click here and scroll down to see a slideshow of the charts and tables.

10 U.S. Sectors, 1928 – 2015: The strategy was tested in 10 different U.S. sector indices. Verdict: Outperform.  Click here and scroll down to see a slideshow of the charts and tables.

Foreign Equities in U.S. Dollar Terms, 1971 – 2015: The strategy was tested in 77 foreign country equity indices, quoted in U.S. dollar terms.  A side test on popular Ishares country ETFs was included.  Interestingly, the performance in the Ishares ETFs was worse than the performance in the country indices.  Verdict: Outperform.  Click here and scroll down to see a slideshow of the charts and tables.

Foreign Equities in Local Currency Terms, 1971 – 2015: The strategy was tested in 32 different foreign country equity indices, quoted in local currency terms.  Verdict: Outperform.  Click here and scroll down to see a slideshow of the charts and tables.

Foreign Equities in Local Currency Terms, 1901 – 1971: The strategy was tested in 8 different foreign country equity indices, quoted in local currency terms, going back to a much earlier period of history.  Verdict: Outperform.

Global Currencies, 1973 – 2015: The strategy was tested in 22 global currency pairs. Verdict: Outperform.  The strategy’s performance in currency was its strongest performance of all.  Click here and scroll down to see a slideshow of the charts and tables.

Fixed Income, 1928 – 2015: The strategy was tested in 11 different fixed income indices. Verdict: Outperform.  Click here and scroll down to see a slideshow of the charts and tables.

Commodities, 1947 – 2015: The strategy was tested in 2 different commodity indices–spot gold and spot oil.  Testing in rolled futures contract indices was also conducted, but is not worth including, given the awful performance of a buy and hold strategy in these indices, particularly over the last 10 years, where the futures chains have spent most of their time in contango, inflicting negative roll yields.  Verdict: Outperform.  Click here and scroll down to see a slideshow of the charts and tables.

100 Largest S&P 500 Stocks, 1963 – 2015: The strategy was tested in the largest 100 S&P 500 stocks that have been continuously publicly traded for at least 20 years.  In contrast to the other tests, the strategy’s performance in this test was terrible.  Not only did the strategy fail to add any value, it actually subtracted value, producing significantly inferior return and risk numbers relative to the X/Y portfolio, despite taking on the same cumulative exposures.  Verdict: Underperform.  Click here and scroll down to see a slideshow of the charts and tables.

Bubble Roadkill Sample, 1981 – 2015: The strategy performed so poorly in the test on individual large company stocks that we decided to try and see if we could come up with a sample of individual company stocks in which the strategy did work.  So we ran the strategy in the context of individual companies that have experienced large boom-bust cycles, and that are now nearly worthless, at least relative to their prior market capitalizations.  Examples include notorious tech names that boomed in the 90s and busted at the turn of the century, notorious housing and finance names that boomed in the early-to-mid aughts and busted in the Global Financial Crisis, and notorious commodity names that boomed in the aughts and that are busting as we speak.  The expectation was that the strategy’s performance in these names would improve significantly, given the opportunity to ride a boom and exit prior to a terminal bust.  The results showed that the performance did, in fact, improve–but the improvement wasn’t as large as hoped for.  The strategy strongly underperformed  in a number of busted names–e.g., Freeport McMoran, Aeropostale, MBIA, and Q-Logic.  Verdict: Outperform.  Click here and scroll down to see a slideshow of the charts and tables.

The following table summarizes the strategy’s performance across all tests on the criteria of Annual Total Return.  The excess total returns and total return win percentages are shown relative to the X/Y portfolio and a portfolio that’s fully invested in the risk asset, abbreviated RISK.


The performance is excellent in all categories except the individual S&P 500 stock category, where the performance is terrible.  In the individual S&P 500 stock category, the strategy produces consistently negative excess returns and a below 50% win percentage relative to X/Y.  In roughly 3 out of 4 of the sampled individual stocks, the strategy earns a total return that is less than the total return of a portfolio that takes on the same risk exposure without doing any timing.  What this means is that with respect to total return, the strategy’s timing performance in the category is worse than what random timing would be expected to produce.

The following table summarizes the strategy’s performance on the criteria of Maximum Drawdown.  The excess drawdowns and drawdown win percentages are shown relative to the X/Y portfolio and a portfolio that’s fully invested in the risk asset, abbreviated RISK.


Note that there’s some slight underperformance relative to the X/Y portfolio in foreign equities and global currencies.  But, as we will see when we look at the final arbiter of performance, the Sortino Ratio, the added return more than makes up for the increased risk.  Once again, the strategy significantly underperforms in the individual S&P 500 stock category, posting a below 50% win percentage and exceeding the X/Y portfolio’s drawdown.  As before, with respect to drawdown risk, the strategy’s timing decisions in the category end up being worse than what random timing would be expected to produce.

The following table summarizes the strategy’s performance on the criteria of the Sortino Ratio, which we treat as the final arbiter of performance.   The excess Sortinos and Sortino win percentages for the strategy are shown relative to the X/Y portfolio and a portfolio that’s fully invested in the risk asset, abbreviated RISK.


The performance is excellent in all categories except the individual S&P 500 stock category.  Importantly, the excess Sortinos for foreign equities and global currencies are firmly positive, confirming that the added return is making up for the larger-than-expected excess drawdown and lower-than-expected drawdown win percentages noted in the previous table.

The strategy’s performance in individual S&P 500 securities, however, is terrible.  In 4 out of 5 individual S&P 500 stocks, the strategy produces a Sortino ratio inferior to that of buy and hold.  This result again tells us that on the criterion of risk-reward, the strategy’s timing performance in the category is worse than what random timing would be expected to produce.

To summarize, MMA strongly outperforms the X/Y portfolio on all metrics and in all test categories except for the individual S&P 500 stock category, where it strongly underperforms.  If we could somehow eliminate that category, then the strategy would pass the backtest with flying colors.

Unfortunately, we can’t ignore the strategy’s dismal performance in the individual S&P 500 stock category.  The performance represents a failed out-of-sample test in what was an extremely large sample of independent securities–100 in total, almost a third of the entire backtest.  It is not a result that we predicted, nor is it a result that fits with the most common explanations for why the strategy works.  To make matters worse, most of the equity and credit indices that we tested are correlated with each other. And so the claim that the success in the 250 indices should count more in the final analysis than the failure in the 100 securities is questionable.

A number of pro-MMA and anti-MMA explanations can be given for the strategy’s failure in individual securities. On the pro-MMA side, one can argue that there’s survivorship bias in the decision to use continuously traded S&P 500 stocks in the test, a bias that reduces the strategy’s performance.  That category of stocks is likely to have performed well over the years, and unlikely to have included the kinds of stocks that generated deep drawdowns.  Given that the strategy works by protecting against downside, we should expect the strategy to underperform in the category.  This claim is bolstered by the fact that the strategy performed well in the different category of bubble roadkill stocks.

On the anti-MMA side, one can argue that the “stale price” effect discussed in an earlier piece creates artificial success for the strategy in the context of indices.  That success then predictably falls away when individual stocks are tested, given that individual stocks are not exposed to the “stale price” effect.  This claim is bolstered by the fact that the strategy doesn’t perform as well in Ishares MSCI ETFs (which are actual tradeable individual securities) as it does in the MSCI indices that those ETFs track (which are idealized indices that cannot be traded, and that are subject to the “stale price” effect, particularly in illiquid foreign markets).

The following table shows the two performances side by side for 14 different countries, starting at the Ishares ETF inception date in 1996:


As the table confirms, the strategy’s outperformance over the X/Y portfolio is significantly larger when the strategy is tested in the indices than when it’s tested in the ETF securities that track the indices.  Averaging all 14 countries together, the total return difference between the strategy and the X/Y portfolio in the indices ends up being 154 bps higher than in the ETF securities.  Notice that 154 bps is roughly the average amount that the strategy underperforms the X/Y portfolio in the individual S&P 500 stock category–probably a coincidence, but still interesting.

In truth, none of these explanations capture the true reason for the strategy’s underperformance in individual stocks.  That reason goes much deeper, and ultimately derives from certain fundamental geometric facts about how the strategy operates.  In the next piece, I’m going to expound on those facts in careful detail, and propose a modification to the strategy based on them, a modification that will substantially improve the strategy’s performance.  Until then, thanks for reading.

Links to backtests: [U.S. EquitiesU.S. FactorsU.S. IndustriesU.S. SectorsForeign Equities in USDForeign Equities in Local CurrencyGlobal CurrenciesFixed IncomeCommoditiesLargest 100 Individual S&P 500 StocksBubble Roadkill]

(Disclaimer: The information in this piece is personal opinion and should not be interpreted as professional investment or tax advice.  The author makes no representations as to the accuracy, completeness, suitability, or validity of any of the information presented.)

Posted in Uncategorized | Comments Off on Trend Following In Financial Markets: A Comprehensive Backtest

The Impact of Taxes on Investor Returns

If you had invested $100,000 in Altria Group ($MO) on March 31st, 1980, and reinvested the dividends, the position today would be worth $93.6MM–a 21.0% annualized return. If you had invested the same amount in Berkshire Hathaway ($BRK-A), the position would be worth $77.0MM–a 20.4% annualized return.

But now let’s assume that you were a high net worth individual, living in beautiful Newport Beach, California, holding the position outside of a retirement account (which is where the vast majority of high-end wealth is held).  At current marginal rates, neglecting deductions, your dividends would have been taxed at 37.1%.  That’s 20% for federal income tax, 13.3% for California state income tax, and 3.8% for the new Obamacare tax.

How would these taxes have impacted your returns?  Instead of compounding at 21.0%, the Altria investment would have compounded at 19.0%.  The final value of the investment would have been reduced by almost half, to $50.6MM.  With Berkshire, however, the final investment value wouldn’t have changed at all.  It would still be $77.0MM, because Berkshire didn’t pay any dividends.  Of course, you would still owe taxes on the $77.0MM, but you would be able to pay them on your own schedule, whenever you wanted to use the money.  And bear in mind that you would also have to pay taxes on the $50.6MM held in Altria.

We all recognize that taxes have a significant impact on long-term returns, especially when expected nominal returns are high.  But we normally assume that the impact is limited to cases of excessive short-term trading, where capital gains that could have otherwise been deferred are prematurely incurred at punitive short-term rates.  The truth, however, is that the detrimental impact of taxes extends beyond the space of capital gains, into the space of dividend income.  As the example illustrates, owning a portfolio heavy on dividend-paying stocks like Altria and light on cashflow recyclers like Berkshire can impose a substantial drag on returns over time.

To quantify the impact of dividend taxes, I built a model that computes what the hypothetical after-tax total return of the S&P 500 would have been on the assumption of different historical dividend payout ratios.  From 1871 to 2015, the S&P 500’s actual payout ratio averaged around 60%.  But what if that payout ratio had instead always been equal to 40%?  What if it had always been lower–say, 20%–or higher–say, 80%?  What would the ensuing after-tax total return have been?  The model correctly computes the answer.  It takes the S&P 500 total return index and back calculates a hypothetical price and dividend index that is consistent with the specified payout ratio.  It then subtracts dividend taxes from the dividend index and recomputes the total return, which is an after-tax total return.  Comparing this after-tax total return to the original pre-tax total return gives the impact of dividend taxes.

The following table shows what the annualized tax drag on the S&P 500’s total return would have been from 1871 to 2015 if different payout ratios and tax rates had been in place.  Note that the pre-tax return was 9%:


The “low” category, 15%, represents the lowest federal qualified dividend tax rate.  The “mid” category, 23.8%, represents the highest federal qualified dividend tax rate.  The “high” category, 32.6%, represents the highest total qualified dividend tax rate for an individual living in New York state paying New York state income taxes.  The “max” category, 56.7%, represents the expected dividend tax rate for an individual in California earning in the top bracket and paying the unqualified federal dividend tax rate in addition to California state income taxes. This last rate is the rate that a high net worth individual living in California would pay on bond income, or on short-term trading profits, assuming that she were trading individual securities, rather than section 1256 futures contracts.

As the table confirms, the impact of taxes is huge.  At a 60% payout ratio, a high net worth individual living in New York state who sticks to a disciplined buy and hold strategy would lose 1.56% per year to dividend taxes alone.  If the market’s payout ratio were increased to 100%, she would lose 2.59% per year.  Note that a 100% payout ratio essentially mimics a case where all gains come from trading.  So if a California resident were to earn all of her returns from trading, and were to pay the maximum marginal income tax rate on those gains, she would lose 4.53% per year relative to the untaxed alternative.

If markets are efficient and maximally aligned to shareholder interests, then we should expect the following two observed changes to continue over time:

First, dividend payout ratios should continue to fall on a cyclically-adjusted basis, particularly as wealth inequality increases. If investors, specifically high net worth investors, are paying attention to their costs, they will seek out companies that reinvest excess cash flows into acquisitions and buybacks rather than companies that pay out excess cash flows as dividends.  CEOs seeking to maximize their share prices will then be incented to favor reinvestment–to behave more like Berkshire and less like Altria.

Second, stocks should continue to trade at historically expensive prices.  For much of history, the tax benefit that equities conferred upon investors–specifically, the ability to use capital appreciation to defer taxation–was not adequately reflected in equity prices and valuations.  Those benefits have actually increased in recent decades, as dividend payout ratios have fallen, and as preferential tax rates for equity income and capital gains have been introduced into law.


As markets become more efficient over time, we should expect these tax benefits to more fully show up in equity prices.  This is particularly true in the current environment, where investors have become increasingly focused on reducing unnecessary costs, and where an increasing amount of societal wealth has concentrated into the hands of high net worth investors, those who stand to receive the greatest tax benefits from equities.

Consider the financial options of a high net worth individual living in California.  She can choose to hold her wealth in shares of Berkshire and pay no tax at all on the returns unless or until she needs to consume out of them, at which point she will be able to monetize them at a preferential long-term rate, or she can own fixed income securities–say, a 30 year treasury bond–and pay a whopping 56.7% tax rate on the earnings.  In the case of the 30 year bond, at a 2.90% yield, her after-tax yield would be a paltry 1.26%.  Unless equities were in an outright bubble, with extremely low imputed returns, what reason could she possibly have, as a long-term investor, to choose the fixed income option?

Posted in Uncategorized | Comments Off on The Impact of Taxes on Investor Returns

Momentum: Slip Counterfactuals, the “Stale Price” Effect, and the Future

The recent piece on the dangers of backtesting has attracted an unusual amount of attention for a piece on this blog.  I’d like to thank everyone who read and shared the piece, and also those who offered up commentary on it.

To be clear, my intent in presenting the Daily Momentum example was not to challenge the Fama-French-Asness momentum factor in specific, or the phenomenon of momentum in general.  Obviously, one failed version of a momentum strategy would not be sufficient to refute the mountain of evidence, both empirical and theoretical, that exists in support of the phenomenon.  My intent was simply to show, in an entertaining way, that robust patterns backed by extremely large amounts of historical data can spontaneously weaken and disappear, out of the blue.  The fact that the example involved momentum per se was incidental.

As investors, we do not intentionally search the data to find “fallen” strategies–strategies that worked for long periods of time, and that then stopped working. When we encounter such strategies, we discard them, because they are useless to us.  What we try to find are “successful” strategies–strategies that have worked consistently across the relevant periods of history, and that have not yet failed in the data.

If we did search specifically for “fallen” strategies, we would come to realize that there are more of them in the data than there are “successful” strategies.  Statistically, the odds are therefore good that when we do find a “successful” strategy, that what we’ve actually found is a strategy that is going to become a “fallen” strategy, as the future plays out.  We need to take that risk seriously, and engage the process of quantitative research with an appropriate level of skepticism.

To many, I’m probably stating the obvious–but the point is not obvious to everyone.  It certainly was not obvious to me when I was first introduced to the fun and exciting process of trying to use a computer to solve the puzzles of financial markets.

Slip: The Validity of a Counterfactual

To conserve space in the prior piece, I left out a discussion of an interesting philosophical question.  When we conduct a backtest, should we use a slip equal to what the bid-ask spread was at the time, or should we use a slip equal to what the bid-ask spread will be now, when we actually run the strategy in the real world?  In the context of the example, if it’s 1999, and I’m testing Daily Momentum to determine whether or not I should implement it as a timing strategy in a real portfolio, should I apply a slip equal to the current bid-ask spread of the security that I am going to use in the strategy, or should I apply a slip equal to the actual bid-ask spread that existed in the market during the years that I’m backtesting across–in this case, the years 1928 to 1999?

The market’s bid-ask spread in past eras was very wide, much wider than today, where it’s almost non-existent in many securities.  As the following chart shows, the average spread from 1928 to 1999 was north of .60%, a number that would have completely destroyed any strategy that traded daily–and that would have significantly impaired strategies that traded monthly or even quarterly.    bid-ask spreadThe price quoted in a historical time series for a stock or an index is typically the midpoint between the highest bid and the lowest ask.  Importantly, that price is not a price that any investor ever had access to.  If an investor wanted to sell, she did not have the option of selling at the “midpoint”–she had to sell at the bid.  If she wanted to buy, she did not have the option of buying at the “midpoint”–she had to buy at the ask.

To give a specific example, if a time series for the stock of “Caterpillar Tractor” shows a price of 4.75 on June 15th, 1932, that number, 4.75, is not an actual price that anyone could have transacted at.  Rather, it is the midpoint between (1) the best ask price that buyers could have transacted at, which was 5, and (2) the best bid price that sellers could have transacted at, which was 4 1/2, quoted in that fraction.

To correct for the difference between the quoted price and the actual transactable price in the backtest, we apply a slip.  Unfortunately, when we apply a slip that reflects the current bid-ask spread, rather than the historical bid-ask spread, we effectively allow our backtest to trade at mythical prices, prices that no one that was actually present in the market had any ability to trade at.

In the Caterpillar Tractor example, if we use a 0.10% slip, we are letting our model buy Caterpillar Tractor for 4.755, when no one was actually offering to sell shares at that price. Similarly, we are letting our model sell Caterpillar Tractor for 4.745, when no one was actually offering to buy shares at that price.  This approach, if we were to use it, would obviously be inauthentic.  If our strategy were to perform well, we would not be able to accurately say:

“An investor would have outperformed using our strategy.”

Rather, we would have to say:

If an investor could have traded at a price that didn’t actually exist, to a buyer or seller that was not actually willing to buy or sell at that price, then the investor would have outperformed using our strategy.”

If that’s what “success” in a backtest means, it’s hard to walk away impressed.

A backtest that transacts using the market’s current spread rather than its historical spread relies on a counterfactual–an assumed hypothetical state of affairs that did not actually exist.  This reliance alone, of course, is not the issue.  All backtests rely on counterfactuals–the very concept of a backtest requires us to assume that someone did something that no one actually did, i.e., executed a specific trade at a specific price in a specific amount.  The issue is whether the assumed counterfactual is compatible with the historical pattern that the tested strategy exploits.  And that’s where our use of a current bid-ask spread, rather than a historical one, gets us into trouble.

For a 0.10% bid-ask spread to have existed in the past, that spread would have had to have been profitable to market makers.  And for the spread to have been profitable to market makers–profitable enough for them to accept the risk to their capital of offering it–the market would have had to have seen dramatically increased levels of volume.  But if the market of the past had seen dramatically increased levels of volume, would it have gone on to produce the same Daily Momentum pattern that it actually produced?  Can we be sure that the increased volume–or any other perturbation that might have been required for a tighter spread–would have left the pattern unaffected?

The answer is no.  And therefore if we want our backtest of Daily Momentum, or of any strategy that exploits a technical price pattern, to be maximally reliable, we need to commit to applying a slip that matches the actual spread in place at the time.  For most of the pre-1990s period, this means a slip of 0.60% or above, applied to each round-trip transaction.

With that said, even though, in a strict sense, the slip assumptions in the backtest are inaccurate and cast signficant doubt on the successful implementation of the strategy, the weird price pattern that the strategy exploits is very real, and demands an explanation. Moreover, the example still serves its intended purpose, which was to show that a seemingly robust pattern can persist in a market or an economy for a very long time, and then disappear.

The “Stale Price” Effect: Daily Momentum and Index Exaggerations

Cliff Asness, whose thoughts on the topic of momentum are obviously worth far more than mine, especially considering that he was among those who discovered the phenomenon (when I was still in middle school), offered a brilliant explanation for part of the success of Daily Momentum.  Historically, not all stocks have traded on every market trading day.  Some stocks experienced days of zero volume, driven either by illiquidity or suspensions.

The fact that not all stocks traded on every market trading day gives an artificial advantage to momentum strategies.  If an index goes up on a given day, a momentum strategy will buy the index, or at least be more likely to buy it.  If, inside the index, there are stocks that did not trade on that day, those stocks will remain quoted at stale prices–either yesterday’s price, or a price from the last day that a trade occurred or a quote was taken. The stale prices will then feed into the index price.  A momentum strategy, in buying the index at the quoted index price, will effectively get to buy the untraded stocks at their stale prices. Tomorrow, or some time in the future, when the stocks do trade again, their prices will rise to reflect the market’s upward movement on the missed trading day.  The momentum strategy, having bought the stocks at stale prices, will then register a profit–a profit that’s entirely fake and unattainable, but that nonetheless shows up as real in the backtest.

In testing, we find that daily momentum beats 2-day momentum beats 3-day momentum beats 4-day momentum and so on.  This observed ranking of the performances supports the view that Cliff’s effect–which, for convenience, we can name the “stale price” effect–is driving the result. All else equal, shorter horizon momentum strategies would be expected to perform better than longer horizon strategies because they leverage the “stale price” effect to a greater degree.  They trade more frequently, and therefore they register the effect’s gains more frequently.

The “stale price” effect allows for an elegant explanation of the decay and eventual implosion of Daily Momentum’s historical performance.  Recall that the question we were unable to answer in the prior piece was: why did the performance start to weaken in the 1980s and 1990s, and then implode in the 2000s?  What was the cause of the loss of efficacy?  We now have a potentially compelling answer: the cause was an increase in the broadness of stock market volume, provoked by the widespread adoption of technologically-assisted investment vehicles that trade all securities indiscriminately–index futures, index mutual funds, index ETFs, and so on. Broader stock market volume, brought about by these changes, would have reduced the “stale price” effect, removing a key driver of the strategy’s outperformance.

It turns out that we can assess the impact of the “stale price” effect by backtesting Daily Momentum on individual stocks.  If we get rid of indices, and just backtest the individual names themselves, we will have removed the effect altogether.  The strategy’s subsequent performance will then reveal what the effect’s true impact was.

The following six charts show the results of backtests of the Daily Momentum strategy from 1963 to 2015 on the total U.S. market and on five well-known individual large cap names: Caterpillar $CAT, General Electric $GE, International Business Machines $IBM, Dupont $DD, and Coca-Cola $KO.  All returns are total returns with dividends reinvested at market.  To make any potential “stale price” effect maximally apparent, a 0% slip is used.


CAT2015ns GE2015ns




The following two charts show the outperformance of Daily Momentum relative to Buy and Hold (ratio between the two) on a log scale for each of the names and for the total market.  The applied slip is 0% in the first chart and 0.10% in the second:


As you can see in the charts, the strategy continues to outperform in the early half of the period, so the “stale price” effect cannot be the entire story.  At the same time, with the exception of Caterpillar, the strategy’s outperformance in the individual names is less pronounced than it is for the the index, which suggests that the “stale price” effect–or some other index-related quirk–is driving a portion of the strategy’s success in the index case.

Interestingly, the strategy’s outperformance died off at different times in different names. Using a 0% slip, the strategy’s outperformance died off in 1974 for IBM, in 1985 for GE and Coke, in 1988 for Dupont, in 1992 for Caterpillar, and in 2000 for the total market. This observation refutes the suggestion that the breakdown is uniquely related to something that happened in the market circa the year 2000, such as decimalization.  In individual securities, the phenomenon had already disappeared decades earlier.

Concerns About Momentum

In the prior piece, I presented a chart of the outperformance relative to the overall market of each value-weighted decile of the Fama-French-Asness Momentum Factor from 1928 to 1999, and then from 2000 to 2015.  The purpose of the chart was not so much to challenge the factor’s efficacy in the period, but simply to show the reasonable decay concern that caused me to look more closely at the performance of momentum after the year 2000, and that prompted me to stumble upon Daily Momentum, with it’s weird break around that date.

A number of readers have e-mailed in asking me to separate out the 1928 to 1999 chart into 15 year increments, to allow for an apples-to-apples comparison of the factor’s efficacy across all 15 year periods.  Here, then, are the requested charts, in 15 year increments, from 1927 to 2015:

1927 - 19421942 - 1957 1957 - 1972 1972 - 1987 1987 - 2002 2000 - 2015

Clearly, when it comes to the performance rankings, the last chart is different from the others.  Momentum still outperforms, but the outperformance isn’t as pronounced or as well-ordered as in prior periods.

The idea that the efficacy of momentum would decay over time shouldn’t come as a surprise.  How could it not decay?  For a strategy to retain outperformance, there have to be barriers to entry that prevent its widespread adoption. From 1928 to the early 1990s, momentum’s barrier to entry was a lack of knowledge. Nobody in the market, save for a few people, knew anything about the phenomenon.  What is momentum’s barrier to entry today, when every business school student in the country learns about the phenomenon, and where any investor that wants to directly harvest its excess returns has 10 different low-cost momentum ETFs to choose from?

Some have suggested that the counter-intuitive nature of momentum, the difficulty that people have in understanding how it could be a good investment strategy, might serve as an effective barrier to entry.  Maybe, but I’m skeptical.  In my experience, investors–both retail and professional–are perfectly willing, and often quite eager, to invest in things that they do not fully understand, so long as those things are “working.”

It seems, then, that one of two things will likely end up happening: either momentum will not work like it used to, or it will work like it used to, and money will flock into it, either through the currently available funds, or through funds that will be set up to harvest it in the future, as it outperforms.  The result will either be a saturation of the factor that attenuates its efficacy, or a self-supporting momentum bubble that eventually crashes and destroys everyone’s portfolio.

Ask yourself, as the multitude of new momentum vehicles that have been created in the last few years–for example, Ishares $MTUM, which now has over $1B in AUM and growing–accumulate performance histories that investors can check on Bloomberg and Morningstar, will it be possible for them to show investors the kinds of returns relative to the market seen in the purple line below, and not become the biggest funds on earth?

1957 - 1972

In my opinion, to avoid saturation and overcrowding, particularly in the increasingly commoditized investment world that we live in, it won’t be enough for momentum to be counter-intuitive.  If a fund or a manager’s performance looks like the purple line above, people will not care what the mechanics were.  They will simply invest, and be grateful that they had the opprtunity.  Given that momentum’s counter-intuitiveness won’t work as a barrier, then, all that is going to be left is underperformance.  The factor will need to experience bouts of meaningful underperformance relative to the market, underperformance sufficient to make investors suspect that the strategy has lost its efficacy.  Then, investors will stay away.  The problem, however, is that the strategy may actually have lost its efficacy–that may be the reason for the underperformance. Investors won’t know.

To be clear, when I talk about momentum underperforming, I’m not talking about the underperformance of a long-short momentum strategy.  A long-short momentum strategy that rebalances monthly will experience severe momentum crashes during market downturns.  Those crashes are caused by rebalancing into 100% short positions on extremely depressed low momentum segments of the market. When the market recovers, those segments, which represent the junk of the market, explode higher, retracing the extreme losses. The increase in the 100% long position during the upturn fails to come close to making up for the extreme rise of the 100% short position, which is rebalanced to a 100% position right at the low.  The result ends up being a significant net loss for the overall portfolio during the period.

Instead, I’m talking about the underperformance of simple vanilla strategies that go long the high momentum segments of the market.  As the charts show, those segments have almost always outperformed the index.  Where they’ve underperformed, the underperformance hasn’t lasted very long.  For them to underperform in a meaningful way–enough to make the performance uninspiring to investors–would be a significant departure from past performance.

What is momentum’s sensitivity to saturation and overcrowding?  How much money would have to flow into the the factor to dampen or eliminate its efficacy, or worse, turn it into an underperformer?  What amount of underperformance is needed to keep a sufficient number of investors out, so that the strategy can retain its efficacy?  How much will this underperformance detract from momentum’s overall excess returns over the market? What is the mechanism of the underperformance?  Is it a gradual decay, or a crash that occurs after a momentum bubble bursts?  Is the right answer to try to time the underperformance–to exit momentum when it’s popular, and re-enter it when it’s out of favor?  If so, what are the signs and signals?  These are all important questions. Since there isn’t any relevant data to go off of–the first experiment on the subject is being conducted right now, on us–investors will have to answer the questions directly, working out the complicated chess position themselves, without the help of historical testing.

Posted in Uncategorized | Comments Off on Momentum: Slip Counterfactuals, the “Stale Price” Effect, and the Future

Financial Backtesting: A Cautionary Tale

Consider the following market timing strategy, which we’ll call “daily momentum”:

(1) If the market’s total return for the day, measured from yesterday’s close to today’s close, is positive, then buy the market at today’s close and hold for one day.

(2) If the market’s total return for the day, measured from yesterday’s close to today’s close, is negative, then sell the market at today’s close and hold the proceeds in a short-term interest bearing deposit account for one day.

The two charts below show the hypothetical performance of the strategy in the aggregate, capitalization-weighted U.S. equity market from February 1st of 1928 to December 31st of 1999 (1st chart right y-axis: linear; 2nd chart right y-axis: log; data source: CRSP):

dmalin99 dmalog99The blue line is the total return of daily momentum, the timing strategy being tested.  The black line is the total return of a buy and hold strategy.  The yellow line is the cash total return.  The gray columns are U.S. recession dates.

The red line is the total return of the X/Y portfolio.  The X/Y portfolio is a mixed portfolio with an allocation to equity and cash that matches the timing strategy’s cumulative ex-post exposure to each of those assets.  The timing strategy spends 55% of its time in equities, and 45% of its time in cash.  The corresponding X/Y portfolio is then a 55/45 equity/cash portfolio, a portolio that is continually rebalanced to hold 55% of its assets in equities and 45% of its assets in cash at all times.

I introduce the concept of an X/Y portfolio to serve as a benchmark or control sample. I need that benchmark or control sample to be able to conduct appropriate statistical analysis on the timing strategy’s performance.  If “timing” itself were of no value, and all that mattered to returns were asset exposures, then the return of any timing strategy would be expected to match the return of its corresponding X/Y portfolio.  The returns would be expected to match because the cumulative asset exposures would be exactly the same–the only difference would be in the specific timing of the exposures.  If the timing strategy outperforms the X/Y portfolio in a statistically significant fashion, then we know that it’s adding value through its timing.  It’s taking the same cumulative asset exposures, and turning them into “something more.”

The green line is the most important line in the chart.  It shows the timing strategy’s cumulative outperformance over the market, defined as the ratio of the trailing total return of the timing strategy to the trailing total return of a buy and hold strategy.  It takes its measurement off of the right y-axis, shown in linear scale in the first chart, and logarithmic scale in the second.

As you can see in the the chart, the timing strategy performs unbelievably well.  From the beginning of 1928 to the end of 1999, it produces a total return more than 5,000 times larger than the market’s total return, with less volatility and a lower maximum drawdown. It earns 25.1% per year, 1400 bps more than the market.  The idea that a timing strategy would be able to beat the market by 14% per year, not only over the short or medium term, but over a period of seven decades, is almost inconceivable.

Now, imagine that it’s late December of 1999, and I’m trying to sell this strategy to you. What would be the best way for me to sell it?  If you’re familiar with the current intellectual vogue in finance, then you know the answer.  The best way for me to sell it would be to package it as a strategy that’s “data-driven.”  Other investors are employing investment strategies that are grounded in sloppy, unreliable guesses and hunches.  I, however, am employing an investment strategy whose efficacy is demonstrated in “the data.”  All of the success that you see in the well-established fields of science–physics, chemistry, biology, engineering, medicine–you can expect to see from my strategy, because my strategy originated in the same empirical, evidence-based approach.

On the totality of what I’ve seen, active investors that center their investment processes on “the data” do not perform any better in real-world investing environments than active investors that invest based on their own market analysis, or investors that simply index. Granted, some investors have done extremely well using data-driven approaches, but others have done poorly–some, spectacularly poorly, to a degree that was not expected beforehand.  The failure to see consistent outperformance from the group as a whole has made me increasingly skeptical of investment approaches that claim to be data-driven.  In my view, such approaches receive too much trust and respect, and not enough scrutiny.  They hold a reputation for scientific credibility that is not deserved.

In this piece, I’m going to use the timing strategy presented above to distinguish between valid and invalid uses of data in an investment process.  In the conventional practice, we take a claim or a strategy and we “backtest” it–i.e., we test it in historical data.  We then draw probabilistic conclusions about the future from the results, conclusions that become the foundations for investment decisions.  To use the timing strategy as an example, we take the strategy and test it back to 1928.  We observe very strong performance.  From that performance, we conclude that the strategy will “probably” perform well into the future. But is this conclusion valid?  If it is valid, what makes it valid?  What is its basis?  Those are the kinds of questions that I’m going to pursue in the piece.

Now, if we want to use the results of a backtest to make statements about the returns that investors are likely to receive if they put the strategy to use in the real-world, the first thing we need to do is properly account for the real-world frictions associated with the strategy’s transactions.  The daily momentum strategy transacts extremely frequently, trading on 44% of all trading days and amassing a total of 8,338 trades across the tested period.  In addition to brokerage fees, these trades entail the cost of buying at the ask and selling at the bid, a cost equal to the spread between the two, incurred on each round-trip (buy-sell pair).

In 1999, the bid-ask spread for the market’s most liquid ETF–the SPDR S&P 500 ETF $SPY–was less than 10 cents, which equated to around 0.08% of market value. The lowest available transaction fee from an online broker was around $10, which, if we assume a trade size of $50,000, amounted to around 0.02% of assets. Summing these together, we arrive at 0.10% as a conservative friction, or “slippage” cost, to apply to each trade.  Of course, the actual average slippage cost in the 1928 to 1999 period was much higher than 0.10%.  But an investor who employs the strategy from 1999 onward, as we are about to do, is not going to see that higher cost; she is going to see the 0.10% cost, which is the cost we want to build into the test.

The following charts show the strategy’s performance on an assumed round-trip slip of 0.10%:

dmalin99slip dmalog99slip As you can see, with slippage costs appropriately factored in, the annual return falls from 25.1% to 18.0%–a sizeable drop.  But the strategy still strongly outperforms, beating the market by more than 700 bps per year.  We therefore conclude that an investor who employs the strategy from 1999 onward is likely to enjoy strong returns–maybe not returns that equal or exceed 18%, but returns that will more than likely beat the market.

The chief threat to our conclusion is the possibility that randomness is driving the performance seen in the backtest.  Of course, we need to clarify what exactly the term “random” would mean in this context.  Consider an example.  In May of 2015, the New York Rangers played the Tampa Bay Lightning in Game 7 of the Stanley Cup semi-finals.  The game was a home game for the Rangers, held at Madison Square Garden (MSG).  The following table shows the Rangers’ performance in game sevens at MSG up to that point:


As you can see, the Rangers were a perfect 7 for 7 in at-home game sevens.  Given this past performance, would it have been valid to conclude that the Rangers would “probably” win the game seven that they were about to play?  Intuitively, we recognize the answer to be no.  The statistic “7 for 7 in at-home game sevens” is a purely random, coincidental occurrence that has little if any bearing on the team’s true probability of victory in any game (Note: the Rangers went on to lose the game).

But this intuition is hard to square with “the data.”  Suppose, hypothetically, that in every at-home game seven that the Rangers play, the true probability of victory is at most 50%–a coin flip.  For a given sampling of at-home game sevens, what is the probability that the team would win all of them?  The answer: 0.5^7 = 0.8%, an extremely low probability. The Rangers successfully defied that extremely low probability and won all seven at-home game sevens that they played over the period.  The implication, then, is that their true probability of winning at-home game sevens must have been higher than 50%–that the coin being flipped cannot have been a fair coin, but must have been a coin biased towards victory.

Consider the two possibilities:

(1) The Rangers’ probability of victory in any at-home game seven is less than or equal to 50%.  Seven at-home game sevens are played over the period, and the Rangers win all seven–an outcome with a probability of less than 1%.

(2) The Rangers’ probability of victory in any at-home game seven is greater than 50%.

Since, from a statistical perspective, (1) is exceedingly unlikely to occur, we feel forced to accept the alternative, (2).  The problem, of course, is that our delineation of those seven games as a “sampling” of the Ranger’s likelihood of winning the upcoming game is entirely invalid.  The sample is biased by the fact that we intentionally picked it non-randomly, out of a large group of possible samples, precisely because it carried the unique pro-Rangers results that we were looking for.

If the probability of victory in each competition in the NHL is exactly 50%, what is the likelihood that over a fifty year period, a few teams out of the group of 30 will secure an unusually long string of victories?  Extremely high.  If we search the data, we will surely be able to find one of those teams.  Nothing stops us from then picking out unique facts about the victories–for example, that they involved the Rangers, that the Rangers were playing game sevens, that the game sevens occurred at MSG–and arguing that those facts are somehow causally relevant, that they affected the likelihood that the team would win. What we should not do, however, is try to claim that “the data” supports this conclusion. The data simply produced an expected anomaly that we picked out from the bunch and identified as special, after the fact.

When we sample a system to test claims about the likelihood that it will produce certain outcomes, the sample needs to be randomblind.  We cannot choose our sample, and present it as a valid test, when we already know that the results confirm the hypothesis. And so if we believe that there is something special about the New York Rangers, Game Sevens, and MSG as a venue–if we believe that the presence of those variables in a game changes the probability of victory–the appropriate way to test that belief is not to cite, as evidence, the seven at-home game sevens that we know the Rangers did win, the very games that led us to associate those variables with increased victory odds in the first place. Rather, the appropriate way to test the belief is to identify a different set of Rangers games with those properties, a set that we haven’t yet seen and haven’t yet extracted a hypothesis from, and look to see whether that sample yields an outsized number of Rangers victories. If it does, then can we legitimately claim that we’ve tested our belief in “the data.”

A Rangers fan who desperately wants people to believe that the Rangers will beat the Lightning will scour the universe to find random patterns that support her desired view. If she manages to find a random pattern, the finding, in itself, will not tell us anything about the team’s true probability of victory.  The fan is not showing us the number of potential patterns that she had to sift through and discard in order to find a pattern that actually did what she wanted it to do, and therefore she is hiding the significant possibility that the pattern that she found is a generic anomaly that would be expected to randomly occur in any large population.

In the context of the timing strategy, how many other strategies did I have to sift through and discard–explicitly, or implicitly–in order to find the strategy that I showed to you, the strategy that I am now trying to make a big deal out of?  You don’t know, because I haven’t told you.  You therefore can’t put accurate odds on the possibility that the strategy’s impressive results were just a random anomaly that would be expected to be found in any sufficiently large population of potential strategies, when searched for.

In practice, the best way to rule out the possibility that we may have preferentially identified random success is to first define the strategy, and then, after we’ve defined it and committed ourselves to it, test it live, in real-world data, data that we have not yet seen and could not possibly have molded our strategy to fit with.  If I suspect that there is something special about the Rangers, game sevens, and MSG, then the solution is to pull those variables together in live experiments, and see whether or not the victories keep happening.  We set up, say, 50 game sevens in MSG for the Rangers to play, and 50 normal games in other stadiums for them to play as a control, and if they end up winning many more of the MSG game sevens than the control games, then we can correctly conclude that the identified game 7 MSG success was not an expected random anomaly, but a reflection of true causal impact in the relevant variables.

Unfortunately, in economic and financial contexts, such tests are not feasible, because they would take too long to play out.  Our only option is to test our strategies in historical data. Even though historical data are inferior for that purpose, they can still be useful.  The key is to conduct the tests out-of-sample, in historical data that we haven’t yet seen or worked with.  Out-of-sample tests prevent us from picking out expected random anomalies in a large population, and making special claims about them, when there is nothing that is actually special about them.

In a financial context, the optimal way to conduct an out-of-sample test on an investment strategy is to use data from foreign markets–ideally, foreign markets whose price movements are unrelated to the price movements in our own market. Unfortunately, in this case, the daily data necessary for such a test are difficult to find, particularly if we want go back to the 1920s, as we did with the current test.

For an out-of-sample test, the best I can offer in the current case is a test in the 10 different sectors of the market, data that is available from CRSP.  The tests will not be fully independent of the previous test on the total market index, because the stocks in the individual sectors overlap with the stocks in the larger index.  But the sectors still carry a uniqueness and differentiation from the index that will challenge the strategy in a way that might shed light on the possibility that its success was, in fact, random.

The following table shows the performance of the strategy in ten separate market sectors back to 1928:


As the table reveals, the only sector in which the strategy fails to strongly outperform is the Telecom sector.  We can therefore conclude that the strategy’s success is unlikely to be random.

Now, the Rangers game 7 MSG streak was only a seven game streak.  On the assumption that the Rangers had an even chance of winning or losing each game, the probability of such a streak occuring in a seven game trial would have been 0.8%–a low number, but not a number so low as to preclude the generic occurrence of the streak somewhere in a large population.  If 100 or 200 or 300 different teams with the same probability of victory played seven games, it’s reasonable to expect that a few would win seven straight.

The same point cannot be made, however, about the timing strategy–at least not as easily. The timing strategy contained 19,135 days, which amounts to 19,135 independent tests of the strategy’s prowess.  In those tests, the strategy’s average daily excess return over the risk-free rate was 0.058%.  The average daily excess return of the control, the X/Y portfolio, was 0.0154%, with a standard deviation of 0.718%.  Statistically, we can ask the following question.  If my timing strategy did not add any value through its timing–that is, if it carried the same expected return as the X/Y portfolio, a portfolio with the same cumulative asset exposures, which is expected to produce an excess return of 0.0154% per day, with a standard deviation of 0.718%–what is the probability that we would conduct 19,135 independent tests of the timing strategy, and get an average return of 0.058% or better?  If we assume that daily returns follow a normal distribution, we can give a precise statistical answer.  The answer: as close to zero as you can possibly imagine. So close that not even separate trials on an inordinately large number of different strategies, repeated over and over and over, would be able to produce the outcome randomly.

For us to find one strategy out of a thousand that adds no value over the X/Y portfolio, but that still manages to randomly produce such extreme outperformance over such a large sample size, would be like our finding one “average” team out of 100 in a sports league that manages to win 1,000 straight games, entirely by luck.  One “average” team out of 100 “average” teams will likely get lucky and win seven straight games in a seven game trial. But  one “average” team out of 100 average teams is not going to get lucky and win 1,000 straight games in a 1,000 game trial.  If there is an “average” team–out of 100, or 1,000, or even 10,000–that manages to win that many games in a row, then we were simply wrong to call the team “average.” The team’s probability of victory cannot realistically have been a mere 50%–if it had been, we would be forced to believe that “that which simply does not take place” actually took place.

I am willing to accept the same claim about the daily momentum strategy.  Indeed, I am forced to accept it.  The probability that daily momentum–or any timing strategy that we might conjure up–would be no better than its corresponding X/Y portfolio, and yet go on to produce such extreme outperformance over such a large number of independent trials, is effectively zero.  It follows that the strategy must have been capturing a non-random pattern in the data–some causally relevant fact that made prices more likely to go up tomorrow if they went up today, and down tomorrow if they went down today.

I should clarify, at this point, that what makes this conclusion forceful is the large number of independent trials contained in the backtest.  Crucially, a large number of independent trials is not the same as a large number of years tested.  If the time horizon of each trial in a test is long, the test can span a large number of years and yet still only contain a small number of independent trials.

To illustrate, suppose that I’ve formulated a technique that purports to predict returns on a 10 year time horizon.  If I backtest that technique on a monthly basis over a 50 year period, the number of independent trials in my test will not be 12 months per year * 50 years = 600. Rather, it will be 50 years / 10 years = 5, a much smaller number.  The reason is that the 10 year periods inside the 50 year period are not independent of each other–they overlap. The 10 year period that ranges from February 1965 to February 1975, for example, overlaps with the 10 year period that ranges from March 1965 to March 1975, in every month except one.  Given the overlap, if the technique works well in predicting the 10 year return from February 1965 onward, it’s almost certainly going to work well in predicting the 10 year return from March onward–and likewise for April, May, June, and so on.  The independence will increase until we get to February 1975, at which point full independence from the February 1965 trial will exist.

To summarize where we are at this point, we accept that the source of the daily momentum strategy’s success is a real pattern in the data–a pattern that cannot have reasonably occurred by chance, and that must have some causal explanation underneath it, even though we don’t know what that explanation is.  The next question we need to address is the following.  What is our basis for concluding that because the system produced that pattern in the past, that it will continue to produce the pattern in the future?

The answer, of course, is that we assume that the causal forces in the system that produced the pattern will remain in the system to keep producing it.  When we make appeals to the results of historical backtests, as I did in this case, that is the assumption that we are making.  Unfortunately, we frequently fail to appreciate how tenuous and unreliable that assumption can be, particularly in the context of a dynamic financial market influenced by an exceedingly large number of complicated forces.

To claim that all of the relevant causal forces in a system–all of the conditions that played a role in producing a particular pattern–remain in the system, and that the conditions will reliably produce the pattern again, we need to know, at a minimum, what those causal forces are.  And to know what those causal forces are, we need an accurate theoretical model of how the system works, how it produces the outcomes that we observe.  With respect to the timing strategy, what is the accurate theoretical model that explains how the market produces the daily momentum pattern that we’ve observed?  I’ve given you no model.  All I’ve to give you is “the data.”  Should you trust me?

Suppose that after deep investigation, we come to find out that the driver of the timing strategy’s success in the 1928 to 1999 period was a peculiarity associated with the way in which large market participants initiated (or terminated) their positions.  When they initiated (or terminated) their positions, they did so by sending out buy (or sell) orders to a broker many miles away.  Those orders were then fractionally executed over a period of many days, without communication from the sender, and without the possibility of being pulled.  The result might conceivably be a short-term momentum pattern in the price, a pattern that the daily momentum strategy could then exploit.  This phenomenon, if it were the true driver for the strategy’s success–and I’m not saying that it is, I made it up from nothing, simply to illustrate a point–would be an example of a driver that would be extremely fragile and ephemeral.  Any number of changes in market structure could cause it to disappear as a phenomenon. The strategy’s outperformance would then evaporate.

Until we work out an accurate account of what is going on with this peculiar result–and my guess is as good as yours, feel free to float your own theories–we won’t be able to rule out the possibility that the result is due to something fragile and ephemeral, such as a quirk in how people traded historically.  We won’t even be able to put a probability on that possibility.  We are flying blind.

Investors tend to be skeptical of theoretical explanations.  The reason they tend to be skeptical is that it is easy to conjure up flaky stories to explain observed data after the fact.  In the case of the daily momentum results, you saw how easy it was for me to make up exactly that type of story.  But the fact that flaky stories are easy to conjure up doesn’t mean that sound theoretical explanations aren’t important.  They’re extremely important–arguably just as important as “the data.”  Without an accurate understanding of how and why a system produces the patterns that we see, there’s no way for us to know whether or for how long the system will continue to produce those patterns.  And, if the system that we’re talking about is a financial market, it’s hardly a given that the system will continue to produce them.

Now, to be fair, if a system has not been perturbed, it’s reasonable to expect that the system will continue to produce the types of outcomes that it’s been producing up to now. But if we choose to use that expectation as a justification for extrapolating past performance into the future, we need to favor recent data, recent observations of the system’s functioning, in the extrapolation.  Successful performance in recent data is more likely to be a consequence of conditions that remain in the system to produce the successful performance again.  In contrast, successful performance that is found only in the distant past, and not in recent data, is likely to have resulted from conditions that are no longer present in the system.

Some investors like to poo-poo this emphasis on recency.  They interpret it to be a kind of arrogant and dismissive trashing of the sacred market wisdoms that our investor ancestors carved out for us, through their experiences.  But, hyperbole aside, there’s a sound basis for emphasizing recent performance over antiquated performance in the evaluation of data.  Recent performance is more likely to be an accurate guide to future performance, because it is more likely to have arisen out of causal conditions that are still there in the system, as opposed to conditions that have since fallen away.

This fact should give us pause in our evaluation of the strategy.  Speaking from the perspective of 1999, over the last two decades–the 1980s and 1990s–the strategy has failed to reliably outperform the market.  Why?  What happened to the pattern that it was supposedly exploiting?  Why are we no longer seeing that pattern in the data?  Given that we never had an understanding of the factors that brought about the pattern in the first place, we can’t even begin to offer up an answer.  We have to simply take it on faith that there is some latent structural property of the market system that causes it to produce the pattern that our strategy exploits, and that even though we haven’t seen the system produce that pattern in over 20 years, we’re eventually going to see the pattern come up again.  Good luck with that.

If you’ve made it this far, congratulations.  We’re now in a position to open up the curtain and see how the strategy would have performed from 1999 onward.  The following chart shows the performance:


As you can see, the strategy would have performed atrociously.  It would have inflicted a cumulative total return loss of 71%.  That loss would have been spread out over a multi-decade period in which almost all asset classes outside of the technology sector saw substantial price appreciation, and in which general prices in the economy increased by more than a third.

So much for basing an investment process on “the data.”  The pattern that the strategy had been exploiting was significantly more fragile than anticipated.  Something changed somewhere in time, and caused it to disappear.  We tend to assume that this kind of thing can’t happen, that a market system is like a physical system whose governing “laws” never change.  That assumption would be true, of course, if we were modeling a market system physically, at the level of the neurons in each participant’s brain, the ultimate source of everything that subsequently happens in the system.  But the assumption is not true if we’re modeling the system at a macroscopic level.  It’s entirely possible for the “macro” rules that describe outcomes in a market system to change in relevant ways over time.  As quantitative investors, we should worry deeply about that possibility.

With respect to the strategy’s dismal performance, the writing was on the wall.  The strategy itself–buying after daily gains and selling after daily losses–was weird and counter-intuitive. We had no understanding whatsoever of the causal forces that were driving its success.  We therefore had no reliable way to assess the robustness or ephemerality of those forces, no way to estimate the likelihood that they would remain in the system to keep the success going.  Granted, if we know, for a fact, that relevant conditions in the system have not been perturbed, we can reasonably extrapolate past performance into the future, without necessarily understanding its basis.  But in this case, the strategy’s recent historical performance–the performance that conveys the most information about the strategy’s likely future performance–had not been good.  If we had appropriately given that performance a greater weight in the assessment, we would have rightly set the strategy aside.

We are left with two important investing takeaways:

(1) From an investment perspective, a theoretical understanding of how the market produces a given outcome is important–arguably just as important as “the data” showing that it does produce that outcome.  We need such an understanding in order to be able to evaluate the robustness of the outcome, the likelihood that the outcome will continue to be seen in the future.  Those that have spent time testing out quantitative approaches in the real world can attest to the fact that the risk that a well-backtested strategy will not work in the future is significant.

(2) When we extrapolate future performance from past performance–a move that can be justified, if conditions in the system have remained the same–we need to favor recent data over data from the distant past.  Recent data is more likely to share common causal factors with the data of the future–the data that matter.

Now, a critic could argue that my construction here is arbitrary, that I went out and intentionally found a previously well-working strategy that subsequently blew up, specifically so that I could make all of these points.  But actually, I stumbled onto the result while playing around with a different test in the 1928 to 1999 period: a test of the famous Fama-French-Asness momentum factor, which sorts stocks in the market on the basis of prior one year total returns.  That factor also contains an apparent deviation in its performance that starts around the year 2000.

The following chart and table show the performance of the market’s 10 momentum deciles from 1928 up to the year 2000:



As the chart and table confirm, the returns sort perfectly on the momentum factor.  Higher momenta correspond to higher returns, and lower momenta correspond to lower returns.

But now consider the results for the period from the year 2000 to today:



The results are off from what we would have expected.  The top performer on total return ends up being the 4/10 decile, with the 5/10 decile, the Sharpe winner, a close second.  The highest moment decile–10/10–ends up in 6th place, with the 9/10 decile in 5th place.

To be fair, it may be possible to explain the unexpected shuffling of the performance rankings as a random statistical deviation.  But the shuffling represents a reason for caution, especially given that the post-2000 period is a recent period like our own, a period in which momentum was already a known factor to market participants.  For all we know, momentum could be a highly fragile market phenomenon that could be perturbed out of existence if only a few smart investors with large footprints were to try to implement it.  Or it could disappear for entirely unrelated reasons–a butterfly could flap its wings somewhere else in the market, and mess things up.  Or it could be robust, and stick like glue in the system no matter how the financial world changes. Without an understanding of the causal drivers of its historical outperformance, it’s difficult to confidently assess the likelihood of any of these possibilities.

The daily momentum strategy’s outperformance was so weird that I asked a quant friend, @econompic, to do his own testing on the strategy, to see if he could reproduce the results. It turns out that he had already reproduced them.  In a short blog post from June, he tested the strategy, which is originally attributable to John Orford, in the daily S&P 500 price index (dividends excluded).  Lo and behold, he observed similarly extreme outperformance, with a massive unexplained break in the year 2000.  This result allayed my chief concern, which was that the outperformance was being driven by some unique quirk in the way that the CRSP indexes were being put together, a quirk that then abruptly changed in the year 2000.  But the same result is found in an index produced by a completely separate entity: S&P, i.e., the S&P 500.

In his testing, @econompic also found that the inverse of the daily momentum strategy–daily mean reversion–which had worked horribly up to the year 2000, has since outperformed, at least before frictions.  The chart below reproduces his result on the total market, with dividends included:


What should we conclude from all of this?  We don’t have to conclude anything–all I’ve offered is an example.  I’m sticking with what I had already concluded–that the currently fashionable project of using “the data” to build superior investment strategies, or to make claims about the future, deserves significant scrutiny.  It’s not useless, it’s not without a place, but it’s worthy of caution and skepticism–more than it typically receives.  Markets are too complex, too dynamic, too adaptive, for it to be able to succeed on its own.

As an investor evaluating a potential strategy, what I want to see is not just an impressive backtest, but a compelling, accurate, reductionistic explanation of what is actually happening in the strategy–who in the market is doing what, where, when and why, and how the agglomeration is producing the result, the pattern that the strategy is successfully exploiting.  I want an explanation that I know to be accurate, an explanation that will allow me to reliably gauge the likelihood that the pattern and the associated outperformance will persist into the future–which is the only thing I care about.

If I’m going to run a systematic strategy, I want the strategy to work nowwhen I run it, as I run it.  I don’t want to have to put faith in an eventual reversion to a past period of glory. That’s too risky–the exploited pattern could have been ephemeral, relevant conditions could have changed.  If a strategy can’t deliver success on a near-term basis, in the out-of-sample test that reality is putting it through, then I’d rather just abandon the systematic approach altogether and invest on my own concrete analysis of the situation, my own gut feel for where things are likely headed, given the present facts.  If I don’t have confidence in my own analysis, if I can’t trust my gut, then I shouldn’t be actively investing.  I should save the time and effort and just toss the money in an index.

To make the point with an analogy from the game of chess, suppose that there’s a certain position on the board.  You’ve done a large statistical analysis of historical games with similar positions, and you’ve observed that a certain move showed a high frequency of subsequent victories.  If you’re going to tell me that I should make that move, I want you to tell me why it’s a good move.  Explain it to me, in terms of the logic of the position itself.  If you can’t do that, I’m going to be skeptical.  And if I make that move, and things turn south for me in the game, well, I’m going to go back to working off of my own analysis of the position.  I’m not going to blindly trust in the recommendations that your statistical analysis is pumping out–nor should I.  Finally, if you’re going to tell me that my attempt to find the right move through a direct analysis of the position isn’t going to work–that my impulses and irrationalities will inevitably lead me astray–what you’re telling me is that I shouldn’t be playing chess.  And maybe you’re right.

Of course, the situation isn’t entirely the same in markets, but there’s a loose analogy.  The actual truth of what the right move is in a chess game is contained in the position that is there on the board, not in positions seen in prior games.  Likewise, the actual truth of what the right move is in a market is contained in the conditions that presently define the market, not in conditions observed in the past–and especially not in conditions observed in the distant past.  The goal is to get to that truth.  Looking to the past isn’t necessarily the best way to get to it.

Posted in Uncategorized | Comments Off on Financial Backtesting: A Cautionary Tale

Operant Conditioning, Market Trends, and Small Bets: A 2012 vs. 2008 Case Study

In this piece, I’m going to examine the question: why do markets trend in the way that they do? Part of the answer, in my view, can be found in the process of operant conditioning, a process that we explored in the prior piece on the market-relevant insights of the great B.F. Skinner.  The capacity to understand and identify that process–as it plays out in us, and in the rest of the market–can provide opportunities for outperformance.

Why Do Markets Trend?

The fact that markets “trend”–or at least “trended” in the past–is a well-established empirical fact, confirmed in tests on different periods of history, different countries, different asset classes, and even different individual stocks.

The question is, why do markets trend?  Here’s a possible answer.  We know that market participants invest on the basis of a fundamental outlook, looking anywhere from a few months to a few years out into the future.  Rationally, it would make sense for them to look out farther, a few decades, maybe longer–but the fundamental picture that far out is too difficult to confidently estimate and too distant to want to focus on, particularly when near to medium term price changes will determine the assessment of performance.

Crucially, fundamental outlooks–for earnings, interest rates, credit, investment, employment, and whatever else is relevant to the asset class or security in question–move in trends.  An important example would be the business cycle.  The economy sees an expansionary period of rising investment, employment, output, profits, wages, optimism, and so on.  Imbalances inevitably accumulate in the process.  The imbalances unwind in the form of a recession, a temporary period of contraction in the variables that were previously expanding.  Historically, the unwind has usually been nudged by the Fed, which tightens monetary policy in response to inflationary pressures.  But, the unwind doesn’t have to occur in that way–it can occur without any Fed action, or even amid aggressive Fed support.

The fundamental outlooks of investors influence their assessment of the fundamental value that is present in markets, and also dictate their expectations of what will happen in markets going forward.  These impacts, in turn, influence what investors choose to do in markets–whether they choose to buy or sell.  Buying and selling determine prices.  So we have the basic answer to our question.  Market prices trend because the fundamentals they track trend, or more precisely, because the fundamental outlooks that guide the buying and selling behaviors of their participants trend.

Time Delay: A Source of Opportunity

But there’s more to the story.  Over time, trends in the fundamental outlook, and in prices, condition market participants into certain mindsets and behaviors.  When these trends change, the prior mindsets and behaviors get unlearned, replaced by new mindsets and behaviors that are more congruent with the new fundamental outlooks that are emerging.

Crucially, this process of conditioning–which involves both the Operant and Pavlovian variety–is not instantaneous.  It takes time to occur, time for repeated observed associations, repeated instances of “behavior followed by result”, to take place.  The delay can represent a window of opportunity for those investors that can identify and act quickly on it.

Academic studies have noted that market responses to earnings surprises and other unexpected news events often lag, i.e., play out over extended periods of time.  That’s not a result that any rational economic theory would predict–responses to the instantaneous release of information should be instantaneous.  Responses are not instantaneous because the participants that execute them are conditioned creatures.  To gain the confidence to buy up to the right price, if there is such a thing, they need reinforcement–the experience of seeing the price rise, when they expected that it might, the experience of having its rise force them out of a prior way of thinking about the security, and into a new way of thinking, one that is more appropriate to the new information.

Case Study:  2012 vs. 2008

Let me illustrate these points with a concrete example.  The following tables show the fundamental pictures of the U.S. stock market and economy in the 1st half of 2008 and the 1st half of 2012.  The first is quantitative, the second qualitative:

Quantitative 1H 2008 vs. 1H 2012:


Qualitative 1H 2008 vs. 1H 2012:


Take a moment to peruse the tables.  Quantitatively, in the 1st half of 2008, we had an S&P priced in the 1300s.  GAAP EPS was falling.  Operating EPS was falling.  Job growth was negative–month after month after month.  Retail sales growth was negative.  Industrial production growth was falling, and had recently turned negative.  Home prices were falling by double digit percentages.  Housing starts were plunging.  Financial stress was rising.

The data necessary to appreciate the emerging trend was there for everyone to see.  And the media was discussing it, talking recession at every turn.  Here are 6 press articles (5 from CNN Money, 1 from Heritage Foundation) from the 1st half of 2008 analyzing the results of the monthly jobs reports: January, February, March, April, May, June.  Read a few of them, so that you can remember back to that time.

Qualitatively, in the 1st half of 2008, the expansion was long in the tooth.  Large price and investment excesses had built up in the housing sector–it had become what almost everyone at the time admitted was a “bubble.”  Those excesses were unwinding in a turbulent, poorly-controlled way–a perfect driver for recession.  A tight labor market was starting to loosen and unravel.  The economy was coming out of a period of constrictive monetary policy, evidenced by a yield curve that had fully inverted two years earlier, and that had only recently come out of inversion, in reaction to Fed easing.  The Fed’s response to the deterioration was highly complacent–the policy rate was still north of 2%, despite the lack of any sign of improvement.  In terms of the price trend, the market had fallen below its 200 day moving average, and hadn’t come back.  Worst of all, there was enormous risk in the financial system–or at least, from our perspective then, enormous uncertainty–related to the contagious effects that the cascading foreclosures and bankruptcies were going to have.

Despite this truly awful fundamental picture–a picture that is admittedly easier to describe as “awful” in hindsight, when we know the answer, but that actually was awful, objectively–investors weren’t able to get truly worried, worried enough to hit the sell button en masse and pull the market’s price down to where it was eventually headed.  If they knew anything about how cycles work, which they did, then they had every reason to abandon the mature, expensive, cracking market that they were holding onto. But they held on–evidenced by the fact that prices stayed high, north of the 1300 level reached less than two years earlier.

Fast-forward to the 1st half of 2012.  The fundamental picture could not have been more different.  Quantitatively, in the 1st half of 2012, corporate earnings were in a rising trend. Payrolls were consistently increasing, month after month.  Retail sales and industrial output were growing healthily.  Housing starts were strongly positive.  Home price growth was firming, and had just broken above zero.  Objective measurements of stress in the financial system were at record lows.

Qualitatively, corporate and residential investment were rising off of recessionary levels, after years of accumulated underinvestment.  An extremely loose labor market was just starting to get tighter.  Adjusting for QE’s distortion of the long end, the yield curve was steep, and had been steep since the end of the previous recession.  The Fed–given its own recent conditioning–was laser-focused on the risks of deflation and a new downturn, and therefore monetary policy was pinned at a generationally loose level, as low as it could go, with various permutations of QE already completed or in progress, and an endless version of QE on deck, being “discussed.”  The S&P was above its 200 day moving average, and had been above that average since the Eurozone and Debt Ceiling scare of the prior year.

The question screams at us.  Why, on earth, were these markets trading at the same price? What were investors thinking?  Why were they willing to hold the crack-infested 2008 market, at a price of 1300, a P/E north of 17 (26 by GAAP), when they could have held cash, and earned 2% to 3% with zero risk?  And why, only four years later, were they perfectly willing to hold cash at 0%, when they could have held a market at a P/E of 13 (15 by GAAP), a market with a fundamental economic backdrop that was clearly improving, in real-time, on every front?

The answer, of course, is that the participants in 2008 had not been put through the experiences that the participants in 2012 had been put through.  They had not yet been conditioned to think about markets bearishly, in terms of risk, crisis, and so on–the many things that can go terribly wrong.  They still had confidence in markets, and in the system–at least more confidence than they had in 2012.  In 2008, that confidence carried things for awhile.  In 2012, the lack of it held things back.

When people are bearish, they will come up with good reasons to be bearish.  The best reason in 2012?  Surely, the Eurozone crisis.  But, on an a priori basis, the subprime crisis, viewed from a 2008 perspective, was every bit as dangerous as the Eurozone crisis was from a 2012 perspective, especially after Draghi’s “whatever it takes” promise.  Still, the Eurozone crisis got people to sell the market down to very attractive valuations, and kept people out, despite those valuations, whereas the subprime crisis didn’t, at least not until everything came to a head with Lehman.  Why the difference?  Because the participants in 2008 were carrying different sensitivities relative to 2012, sensitivities that had been behaviorally conditioned by different recent histories.  To use the analogy of driving, market participants in 2008 were normal drivers, maybe even complacent drivers; market participants in 2012, in contrast, were drivers that had just suffered life-altering car accidents.  As happens, they were prone to systematically overestimate the risks that more accidents were to come.

Transition: Not a Rational Process

The process through which risk aversion is acquired, wears off, and is replaced by risk appetite, is not a rational process. It’s a behavioral process, a process that involves the Skinnerian phenomena of punishment, extinction and reinforcement.

  • Punishment: The investor engages in the investment behavior–say, positioning into equities–and the behavior is met with harm, pain, losses, red screens.  And then again. And then again.  Confidence breaks down.  Fear develops.  The investor becomes risk-averse.
  • Extinction: The investor engages in the investment behavior, or watches someone else engage in it–but this time, the bad consequences don’t follow.  The investor engages in it again, or again watches someone else engage in it–again, no bad consequences. Where are they?  Where is the meltdown?  With time, the conditioning starts to weaken.
  • Reinforcement: The investor engages in the investment behavior, or watches someone else engage in it, and the behavior is met with a reward–rising prices, gains, profit, green screens.  So the investor repeats the behavior, or watches someone else repeat it. Again–good consequences follow.  The investor continues, ups his exposure, puts more in, and good consequences continue to follow–at least more often than bad.  Wow, this works!  Confidence develops.  Trust develops.  Belief develops.  A bullish mindset emerges.

Because the process is behavioral and experiential, it takes time to unfold.  It’s not a transition that investors can make in an instant.  That’s why an improving picture cannot instantaneously create the prices that it fundamentally warrants.  It’s why markets can only get to those prices by trending to them, gradually, as the fundamentals win out over the inertia of prior conditioning.

This transition, this learning process, is further obstructed, further slowed, by the behavioral biases of anchoring and disposition effect.  When prices fall from where they’ve been, they look cheaper, more appetizing to us.  We envision the possibility that they might return to where they recently were, rewarding us with a nice gain.  And so we experience an increased urge to buy.  It can be hard for us to fight that urge, to be patient, to stand back, when standing back is the right course of action.  Similarly, we don’t like the idea of selling and cementing a loss, or at least the idea of selling for less today than we could have sold for yesterday–we feel like we’re short-changing ourselves.  So we experience an increased hesitation to sell.

A similar dynamic takes place in reverse, when prices increase.  They look more expensive, less appetizing, to us.  We visualize them returning to where they recently were, inflicting a loss on us.  We don’t like paying more today than we could have recently paid–it doesn’t feel like a smart, profitable move.  So we experience an increased hesitation to buy.

The behavioral biases of anchoring and disposition effect make it more difficult to buy a rising market that should be bought, and more difficult to sell a falling market that should be sold.  Combined with the effects of prior conditioning, they can prevent a market from fully reflecting its fundamental strengths. But eventually, the conditioning gets unlearned, replaced with the continued positive reinforcement of good outcomes. The increased confidence and risk appetite that those outcomes give rise to overtake the inertial behavioral forces that were holding things back.

My Experience in 2012

I got out of the market in late 2010, with the S&P in the low 1200s, and stayed out, for the vast majority of the next two years.  I wasn’t as bearish during that period as I was in 2008, but I wasn’t too far off.  I saw a market that was over 100% off the lows, not far from the previous “bubble” highs.  What upside was left?  Were we going to go right back to the prior excesses?  Right back into another bubble?

On the issues front, I saw plenty to worry about: record high profit margins, a looming fiscal tightening to revert them, a potential crisis in Europe, weakness in China, and so on. The market, at 13-15 times earnings, was not enticing to me, at least not enough to make me want to walk into those risks.  I was more than able to come up with reasons why valuations “should” have been lower, and were headed lower.  And that’s what I wanted to do–come up with reasons to stay out, because I didn’t want to have to get in.  The pool was too cold.

In the last few weeks of 2012, a number of reliable people were telling me that the fiscal cliff was probably going to get resolved.  That was not what my bearish plan had called for. At least subconsciously, I wanted a large fiscal tightening to happen, so that profit margins and earnings would fall, sending the market to a lower valuation that I could then buy back in at.

I was still able to find reasons to stay out–there are always reasons to be out or in, for those that want to find them.  But for me, the fiscal cliff was the big one.  I couldn’t argue with the clear improvement in the U.S. economy, the clear uptrend in the data, nor could I argue with with the prospect that Europe would stay together, given Draghi’s demonstration of resolve.  So with the fiscal cliff about to be off the table, I had a choice to make: do I stay out?  Or do I get back in, at a higher price than I had gotten out at two years prior?

As I wrestled with the choice, I began to more fully appreciate the effects of my own behavioral conditioning.  It dawned on me that the prior downturn had conditioned me into a bearish confidence, and that this confidence was waning with each tick higher, each failed prediction.  Not only me–but those around me.  In Skinnerian terms, what was taking place, within me, and in the market as a whole, was a gradual extinction of the conditioning of the 2007-2009 experience.

So at the end of 2012, with the S&P in the low 1400s, I got back in–not with everything, but with enough.  As the market started moving in early 2013, I started getting more confident, becoming more and more of a believer.  I remember thinking to myself, in early 2013, why not go all in–100% equities?  I was fully long in 2006 without any fear–why do I fear going fully long now?  Why does that feel irresponsible?  I knew the answer: because of all the things that had happened since then, all the lessons that I had “learned” about markets.  But the truth is that I hadn’t actually “learned” anything.  I had simply been conditioned into a certain risk-averse mindset, through preferential exposure to a generationally ugly period in the business and credit cycles.  Being conditioned into a mindset is not the same as gaining knowledge, wisdom, or understanding.  The fact was, the mindset that I was carrying was not helping me position correctly in the environment that I was currently in.  So I needed to get rid of it.

It dawned on me that the rest of the market was going through the same process–a process of extinction and reinforcement.  I saw it in my interactions with others–bulls and bears.  And so I realized, I needed to go now, get in front of the process as soon as possible, to capture all of what was there to take.  If I waited for the market’s continued positive reinforcement to condition me with the confidence to go all in–which I knew it was eventually going to do–I would have to pay a much higher price, foregoing precious upside.

I thought to myself, but what upside was left in buying the market north of 1500–at prices equal to the 2000 and 2007 tops?  Were we really going to go to 1600, 1700, 1800, 1900, 2000?  Those numbers felt dangerous.  But I realized that this line of thinking also did not represent any kind of special insight or wisdom–it was just the effect of a well-known behavioral bias: anchoring.  New highs always look and feel expensive, always look and feel risky–but still, markets find a way to bust through them.  It’s true that the valuation at those prices wasn’t going to be as attractive as it had recently been, but it wasn’t going to be any less attractive than it had been in the prior cycles that I had seen, and that I had been invested in.

So I pushed myself–why not 1600, why not 1700, why not 1800?  Get those numbers in your head, acclimatize to them, get comfortable with them, so that you can find the strength to position yourself correctly for what is obviously a market headed higher, a market buoyed by a fantastic backdrop underneath it: an improving housing market, rising employment, but with plenty of labor supply still available, no private sector excesses anywhere that might give way to a recession, historically easy monetary policy with a Fed years away from tightening, a rising earnings trend (despite the profit margin fears–which were consistently being refuted by the actual results), a cheap P/E valuation, and nowhere else to get a return but in equities, with market participants increasingly figuring that fact out, putting relentless upward pressure on prices.

By the summer of 2013, I was all in–100% equities, no bonds, no cash.  Of course, I would have been much better off if I could have come to these realizations earlier in the cycle, but I can’t complain, because I ended up very well positioned for the “second leg” of the bull market, far better than I otherwise would have been.

The Questions to Ask

Now, to be clear, Skinner’s behavioral insights, as applied to markets, should not be interpreted as some kind of investing cure-all.  You can understand behavior and still be completely wrong.  But a behavioral approach at least gives us a correct model, a model that allows us to ask the right questions.  The questions we need to ask are:

  • First — What is the trend in the fundamentals of the asset class or security in question? For equities as an asset class, we might look at the trend in earnings, interest rates, inflation, the business cycle, financial and credit conditions, employment, and so on. The trend may not be clear–but sometimes it is clear.
  • Second — How have market participants been conditioned to approach and view the asset class or security in question, given their recent experiences with it?  How is this conditioning evolving over time? Is it strengthening, based on confirmation from reality, confirmation from the fundamental trend?  Is it extinguishing, based on lack of confirmation?

If, as in 2012, the first answer is bullish, with the fundamental trend improving, going up, and the second answer bearish, with market participants stuck in an overly-cautious mindset, brought about by the recent experience of full-fledged crisis, then you have a buying opportunity.  The acquired risk-aversion will gradually die off and give way to risk-appetite, with rising prices both resulting from and fueling the process–basically, what we saw from late 2012 to now, or at least from late 2012 to late 2014, when small cracks in the fundamental picture (oil, the dollar, credit, falling earnings) began to emerge.

If, as in 2008, the first answer is bearish, with the fundamental trend deteriorating, going down, and the second answer bullish, with market participants exhibiting a complacent, inappropriately-trustful mindset, conditioned by a long prior period of market tranquility and comfort, then you have a selling opportunity.  So long as the fundamental trend continues down, the complacency and unwarranted trust will eventually turn into fear and selling–with concomitant pain for all.

Now, to the question we all want answered: where are we right now?  In my view, we’re in neither place.  With respect to the question of fundamentals, we have a domestic economy that’s doing fine, embarked on a balanced expansion that has room to run, an expansion that will continue to be supported by pent-up demand in the household sector, and by historically easy monetary policy from the Fed.  Credit conditions have tightened somewhat–but we know the reason why: the downturn in the energy sector, not a larger downturn in the business cycle (unless you think the energy downturn has already caused a larger downturn in the business cycle, or will cause one–I don’t).  Earnings growth is weak, but again, we know the reasons why–falling profits in the energy complex, and a rising dollar, temporary factors that are not indicative of any larger trend in corporate profitability.  We have a global economy that’s struggling in certain places, but it will probably manage to muddle through–which is all that a domestic investor needs it to do.  Finally, even though equity returns may not turn out to be attractive, there’s nowhere else for an investor to earn a decent return–and there isn’t going to be for a very long time. That puts upward pressure on prices.

With respect to the question of conditioning, we have a mature, expensive market that has enjoyed a prolonged, 6 year run, and that has slayed many dragons and doubters along the way.  The experience has made people confident–confident to buy dips, confident to chase yield, confident to be heavily invested in equities, even at rich prices, confident in the powers of the Fed and other central banks, confident that the U.S. corporate sector will continue to demonstrate unusual strength in profitability, confident to dismiss warnings from bears, because “they’re always wrong”, and so on.  That confidence was not learned through the reliable study of any kind of history, but through a long period of cyclical reinforcement.  It’s been unlearned many times before, and will be unlearned again.

Unfortunately, then, the combination that we’re faced with–a decent fundamental backdrop with some cracks in certain places, cracks that probably aren’t going to spread, plus signs of complacency and overconfidence, at least among the investor class that has been massaged with large gains for several years now–doesn’t create a clear opportunity either way.  That’s why, with the VIX high, I prefer selling downside volatility to being long, betting on a market that goes nowhere, and that gives nothing to anyone, bull or bear.  For all the noise, that’s exactly what the market of 2015 has been.

Conclusion: A Practical Technique

To conclude, there’s a practical technique, inspired by a reading of Skinner’s work, that we can use to help us find the strength to position correctly when prior conditioning and behavioral biases are preventing us from doing so.  That technique entails making what @gcarmell of CWS Capital calls Little Bets–bets sized down to very small, manageable levels–and letting the consequences of those bets operantly condition us.   

Suppose that the market has suffered a big correction that you correctly anticipated and positioned for. Suppose further that the correction is starting to get long in the tooth, with emerging signs of stabilizaion in the factors and forces that provoked it.  So you start to get a sense–a fear–that it’s ending, that you need to get back in.  You will probably find it difficult to act on that sense–especially if, like me, you dislike taking actions that can turn into big mistakes.  You will probably find yourself seeking out and embracing dubious, feel-good reasons that will confirm your inertia and allow you to stay put–whatever “verbal behavior” you have to engage in to avoid having to jump in:

“No, this isn’t over yet, what’s happening is just a sucker’s rally, the market is headed back down, that’s what so and so on TV is saying.  Stay out!”  

You need to stop and ask yourself: Do you really believe all this?  Is it likely to be true? Maybe it is.  If it is, then the answer is to stay the course.  But if it isn’t, if the wiser part of you realizes that it’s time to get back in, but you aren’t able to muster the willpower to actually do that, to actually take that plunge, then the solution is to go in with only a small amount–however small it needs to be in order to not be met with resistance.  See what happens, see what consequences follow.  You can rest assured that if the trade does well, you will find yourself with increased confidence and appetite to do more of it, in characteristically Skinnerian fashion.

The same holds true in the other direction–when you sense that its time to get out of the market, after a long period of market prosperity that is starting to crack.  If you can’t find the strength to make large bets against something that has been winning for so long, then don’t bet large, bet small.  Sell a few shares, see what happens.  Not only will you be giving the market an opportunity to condition you into the position that you think you should be taking, you will also be “selecting by consequence”–in our evolutionary world, that tends to be a pretty good strategy. 

In my own case, I suspect that if I had not started small, with manageable bets at the end of 2012, I would not have been able to get to what, for me, was the appropriate bull market position–fully invested.  Those bets laid the foundation for a growing confidence that ultimately led to full conversion.

Posted in Uncategorized | Comments Off on Operant Conditioning, Market Trends, and Small Bets: A 2012 vs. 2008 Case Study

B.F. Skinner and Operant Conditioning: A Primer for Traders, Investors, and Economic Policymakers

skinner4Markets and economies are agglomerations of interconnected human behaviors.  It’s a surprise, then, that in the fields of finance and economics, the work of history’s most famous behavioral psychologist, B.F. Skinner, is rarely mentioned. In this piece, I’m going to present an introduction to Skinner’s general theory of behavior, drawing attention to insights from his research that can be applied to trading, investing, and economic policymaking.  The current piece will serve as a primer for the next one, in which I’m going to discuss the insights with a greater practical emphasis.

If you’re like most, you come to this blog to read about finance and economics, not about psychology or philosophy, so you’re probably ready to close the window and surf on to something else.  But I would urge you to read on.  Skinner’s work was deep and profound–brimming with insights into the way reality and human beings work.  Anyone interested in finance and economics will benefit from being familiar with it.

Pavlovian Conditioning, Operant Conditioning and Selection by Consequence

In the early 1900s, Russian physiologist Ivan Pavlov conducted experiments on canine digestion.  He exposed restrained dogs to the scent of meat powder, and measured the extent to which they salivated in response to it.  In the course of these experiments, he stumbled upon a groundbreaking discovery: Dogs that had been put through experiments multiple times would salivate before any meat powder was presented, in response to the mere sight of lab assistants entering the room.

Pavlov hypothesized that repeated associations between “lab assistants” and “the smell of meat” had conditioned the dogs to respond to the former in the same way as the latter–by salivating.  To test this hypothesis, Pavlov set up another experiment.   He rang a bell for the dogs to hear, and then exposed them to the scent of meat powder.  He found that after repeated associations, the dogs would salivate in response to the mere sound of the bell, before any meat powder was presented.

Around the same time that Pavlov conducted his experiments on salivation in dogs, the American psychologist Edward Thorndike conducted experiments on learning in cats.  In these experiments, Thorndike trapped cats inside of “puzzle” boxes that could only be opened by pushing on various built-in levers.  After trapping the cats, he timed how long it took them to push on the levers and escape.  When they escaped, he rewarded them with food and put them back inside the boxes to escape again. He noticed that cats that had successfully escaped took sequentially less time to escape on each subsequent trial.  He concluded that the cats were “learning” from the trials.

In the late 1930s, Harvard psychologist B.F. Skinner synthesized the discoveries of Pavlov, Thorndike, and others into a coherent system, called Behaviorism.  Behaviorism sought to explain the behaviors of organisms, to include the behaviors of human beings, purely mechanistically, in terms of causal interactions with the environment, rather than in terms of nebulous, unscientific concepts inherited from religious tradition: “soul”, “spirit”, “free-will”, etc.

Skinner distinguished between two types of conditioning:

Classical Conditioning: The kind of conditioning that Pavlov discovered, which involves the repeated association of two stimuli–an unconditioned stimulus (the smell of meat) and a conditioned stimulus (the sound of a bell)–in a way that causes the conditioned stimulus (the sound of a bell) to evoke the same response (salivation) as the unconditioned stimulus (the smell of meat).  The unconditioned stimulus (the smell of meat) is called “unconditioned” because its connection to the response (salivation) is hard-wired into the organism.  The conditioned stimulus (the sound of a bell) is called “conditioned” because its connection to the response (salivation) is not hard-wired, but rather is formed through the “conditioning” process, i.e., the process of changing the organism through exposure.

Operant Conditioning: The kind of conditioning that Thorndike discovered, wherein the subsequent frequency of an organism’s behavior is increased or decreased by the consequences of that behavior.  When behavior is followed by positive outcomes (benefit, pleasure), the behavior goes on to occur more often; when behavior is followed by negative outcomes (harm, pain), the behavior goes on to occur less often, if at all.  Operant conditioning differs from Pavlovian conditioning in that it involves the learning of a voluntary behavior by the consequences of that behavior, rather than the triggering of an automatic, involuntary response by exposure to repeated associations.

Skinner is known in popular circles for the fascinating experiments that he conducted on the conditioning, experiments in which he used the technique to get animals to do all kinds of weird, unexpected things.  In the following clip, Skinner shares the result of one such experiment, an experiment in which he successfully taught pigeons to “read” English:

Skinner liked to explain operant conditioning in terms of the analogue of evolution.  Recall that in biological evolution, random imperfections in the reproductive process lead to infrequent mutations.  These mutations typically add zero or negative value to the organism’s fitness.  But every so often, purely by chance, the mutations end up conferring advantages that aid in survival and reproduction.  Organisms endowed with the mutations go on to survive and reproduce more frequently than their counterparts, leaving more copies of the mutations in subsequent generations, until the mutations become endemic to the entire reproductive population. That is how the adapted species is formed. We human beings, with these complex brains and bodies, are direct descendents of those organisms–human and pre-human–that were “lucky” enough to be endowed with the “best” mutations of the group.

Biological evolution involves what Skinner brilliantly called “selection by consequence.” Nature continually “tries out” random possible forms.  When the forms bring good consequences–i.e., consequences that lead to the survival and successful self-copying of the forms–it holds on to them.  When they bring bad consequences–i.e., consequences that lead to the death of the forms–it discards them.  Through this process of trial-and-error, it extracts order from chaos.  There is no other way, according to Skinner, for nature to create complex, self-preserving systems–biological or otherwise.  It has no innate “intelligence” from which to design them, no ability to foresee survivable designs beforehand based on a thought process.

Skinner viewed animal organisms, to include human beings, as a microcosm for the same evolutionary process–“selection by consequence.”  An animal organism, according to Skinner, is a highly complex behavior selection machine.  As it moves through its environment, it is exposed to different types of behaviors–some that it tries out on its own, randomly, or in response to causal stimuli, and some that it observes others engage in.  When the behaviors produce positive consequences (benefit, pleasure, etc.), its brain and psychology are modified in ways that cause it to engage in them more often.  When the behaviors produce negative consequences (harm, pain, etc.), its brain and psychology are modified in ways that cause it to refrain from them in the future.  Through this process, the process of operant conditioning, the organism “learns” how to interact optimally with the contingencies of its environment.

According to Skinner, brains with the capacity for operant conditioning are themselves consequences of evolution.  Environmental conditions are always changing, and therefore the specific environment that an organism will face cannot be fully known beforehand.  For this reason, Nature evolved brains that have the capacity to form optimal behavioral tendencies based on environmental feedback, rather than brains that have been permanently locked into a rigid set of behaviors from the get-go.

Contrary to popular caricature, Skinner did not think that animal organisms–human or otherwise–were “blank slates.”  He acknowledged that they have certain unchangeable, hard-wired biological traits, put in place by natural selection.  His point was simply that one of those traits, a hugely important one, is the tendency for certain of behaviors of organisms–specifically, “voluntary” behaviors, those that arise out of complex information processing in higher regions of the brain–to be “learned” by operant conditioning, by the consequences that reality imposes.

The Mechanics of Operant Conditioning: Reinforcement and Punishment

Skinner categorized the feedback processes that shape behaviors into two general types: reinforcement and punishment.  Reinforcement occurs when a good–i.e., pleasurable–consequence follows a behavior, causing the behavior to become more frequent–or, in non-Skinnerian cognitive terms, causing the organism to experience an increased desire to do the behavior again.  Punishment occurs when a bad–i.e., painful–consequence follows a behavior, causing the behavior to become less frequent–or, in non-Skinnerian terms, causing the organism to experience an aversion to doing the behavior again.  

In the following clip, Skinner demonstrates the technique of operant conditioning, using it to get a live pigeon to turn 360 degrees:

Skinner starts by putting the pigeon near a machine that dispenses food on a push-button command.  He then waits for the pigeon to turn slightly to its left.  In terms of the analogue of biological evolution, this period of waiting is analogous to the period wherein Nature waits for the reproductive process to produce a mutation that it can then “select by consequence.”  The pigeon’s turn is not something that Skinner can force out of the pigeon–it’s a behavior that has to randomly emerge, as the pigeon tries out different things in its environment.

When Skinner sees the turn happen, he quickly pushes the button and dispenses the reward, food.  He then waits for the pigeon to turn again–which the pigeon does, because the pigeon starts to catch on.  But this time, before dispensing the food, he waits for the pigeon to turn a bit farther.  Each time, he waits for the pigeon to turn farther and farther before dispensing the food, until the pigeon has turned a full 360 degrees.  At that point, the task is complete.  The pigeon keeps fully turning, and he keeps feeding it after it does so.

What is actually happening in the experiment?  Answer: the pigeon’s brain and psychology are somehow being modified to associate turning 360 degrees with food, such that whenever the pigeon is hungry and wants food, it turns 360 degrees.  If we want, we can describe the modification as a modification in a complex neural system, a physical brain that gets rewired to send specific motor signals–“turn left, all the way around”–in response to biological signals of hunger.  We can also describe the process cognitively, as involving an acquired feeling that arises in the pigeon–that when the pigeon gets hungry, it feels an urge or impetus to turn 360 degrees to the left, either automatically, or because it puts two and two together in a thinking process that connects the idea of turning with the idea of receiving food, which it wants.  Skinner famously preferred the former, the non-cognitive description, arguing that cognitive descriptions are unobservable and therefore useless to a science of behavior.  But cognitive descriptions work fine in the current context.

To keep the conditioned behavior in place, the conditioner needs to maintain the reinforcement.  If the reinforcement stops–if the pigeon turns, and nothing happens, and then turns again, and nothing happens again, and so on–the behavior will eventually disappear.  This phenomenon is called “extinction.”  It’s a phenomenon that Pavlov also observed: if the association between the bell and the arrival of meat powder is not maintained over time, the dogs will stop salivating in response to the bell.  

Importantly, the capacity for conditioned behavior to go extinct in the absence of reinforcement is itself a biological adaption.  Learning to behave optimally isn’t just about learning to do certain things, it’s also about unlearning them when they stop working.  An organism that is unable to unlearn behaviors that have stopped working will waste large amounts of time and energy doing useless things, and will end up falling behind in the evolutionary race. 

Skinner noted that effective reinforcement needs to be clearly connectable to the behavior, preferably close to it in time.  If food appears 200 days after the pigeon turns, the pigeon is not going to develop a tendency to turn.  The connection between turning and receiving food is not going to get appropriately wired into the pigeon’s brain.  At the same time, the reward doesn’t have to be delivered after every successful instance of the behavior.  A “variable” schedule of reinforcement can be imposed, in which the reward is only delivered after a certain number of successful instances, provided that the number is not too high.

Skinner noted that when an organism observes a consequence in response to a behavior, it “generalizes.”  It experiments with similar behaviors, to see if they will produce the same consequence.  For example, the pigeon who received food by pecking a disk in the first video will start trying to peck similar objects, in the hopes that pecking them will produce a similar release of food.  Eventually, after sufficient modification by the environment, the organism learns to “discriminate.”  It learns that the behavior produces a consequence in one situation, but not in another.

Extension to Human Beings: The Example of Gambling

The natural inclination is to dismiss Skinner’s discoveries as only being applicable to the functioning of “lesser” organisms–rats, pigeons, dogs, and so on–and not applicable to the functioning of human beings.  But the human brain, Skinner argued, is just a more computationally advanced version of the brains of these other types of organisms.  The human brain comes from the same common place that they come from, having been progressively designed by the same designer, natural selection.  We should therefore expect the same kind of learning process to be present in it, albeit in a more complex, involved form. Skinner demonstrated that it was present, in experiments on both human children and human adults.

gamblingPsychologists have long since struggled with the question, why do human beings gamble? Gambling is an obviously irrational behavior–an individual takes on risk in exchange for an expected return that is less than zero.  Why would anyone do that? Marx famously thought that people, particularly the masses, do it to escape from the stresses of industrialization.  Freud famously thought that people–at least certain men, the clients he diagnosed–do it to unconsciously punish themselves for unconscious guilt associated over the oedipal complex–the sexual attraction that they unconsciously feel–or at least unconsciously felt, as children–for their mothers.

Contra Marx and Freud, Skinner gave the first intellectually respectable psychological answer to the question.  Human beings gamble, and enjoy gambling, even though the activity is pointless and irrational, because they’ve been subjected to a specific schedule of reinforcement–a “variable” schedule, where the reward is not provided every time, but only every so often, leaving just enough “connection” between the behavior and the reward to forge a link between the two in the brain and psychology of the subject.

Skinner showed that in order for the pigeon to maintain the pecking and turning behaviors, it doesn’t need to get the reward every time that those behaviors occur.  It just needs to get the reward every so often–that will be enough to keep the pigeon engaging in the behaviors on an ongoing basis.  Skinner noted that the same was true about gamblers. Gamblers don’t need to win every time, they just need to win every so often.  A grandiose victory–a jackpot–that occurs every so often is more than enough to imbue them with inspiring thoughts of winning, and an associated appetite to get in and play.  It is the business of a casino to optimize the schedule at which gamblers win, so that they win just enough to sense that victory is within their reach, just enough to feel the associated thrill and excitement each time they turn the lever.  An efficient casino operation will not afford gamblers any more victories than that–certainly not enough for them to actually make money on a net basis, which would represent the casino’s net loss.

The process through which the gambler is conditioned to gamble is obviously not as simple as the process through which the pigeon is conditioned to peck.  For the human being, there is the complex and vivid mediation of thought, memory, emotion, impulse, and the internal struggle that arises when these mental states push on each other in conflicting ways.  But the fact remains that the reinforcement of winning is ultimately what gives rise to the appetite to play, the psychological pull to engage in the behavior again.  If you were to completely take that reinforcement away, the appetite and pull would eventually disappear, go extinct–at least in normal, mentally healthy human beings.  If casinos were designed so that no one ever won anything, no one ever experienced the thrill and excitement of winning, then no one would ever bother with the activity.  Casinos would not have any patrons.

Skinner’s insights here have a clear application to the understanding of stock market behavior, an application that we’re going to examine more closely in the next piece.  To get a sense of the application, ask yourself: what, more than anything else, gives investors the confidence and appetite to invest in risky asset classes such as equities?  Answer: the experience of actually investing in them, and being rewarded for it, consistently.  Sure, you can tell people the many reasons why they should invest in equities–that’s all wonderful.  But to them, it’s just verbiage, someone’s personal opinion.  In itself, it’s not inspiring. What’s inspiring is the actual experiencing of taking the risk, and winning–making money, on a consistent basis.  Then, you come to trust the process, believe in it, viscerally.  You develop an appetite for more.  As many of us know from our own mistakes, the experience can be quite dangerous–enough to make clueless novices think they are seasoned experts. 

On the flip side, what, more than anything else, causes investors to become averse to investing in risky asset classes such as equities?  Again, the experience of actually investing in them, and getting badly hurt.  A dark cloud of danger and guilt will then get attached to the activity.  The investor won’t want to even think about going back to it for another try–at least not until sufficient time has passed for extinction to occur.  This is operant conditioning in practice.  

The concepts of Classical Conditioning, Operant Conditioning, Extinction, Generalization, Discrimination, and many other concepts that Skinner researched have a role in producing the various trends and patterns that we see play out in markets.  Understanding these processes won’t give us a crystal ball to use in predicting the market’s future, but it can help us better understand, and more quickly respond to, some of the changes that happen as economic and market cycles play out.

Operant Conditioning: Observations Relevant to Traders, Investors, and Economic Policymakers

In this final section, I’m going to go over some unique observations that Skinner made in the course of his research that are relevant to traders, investors, and economic policymakers.  The observation for which Skinner is probably most famous is the observation that reinforcement is a more effective technique for producing a desired behavior than punishment.  We want the pigeon turn.  We saw that giving it a reward–food–works marvelously to produce that behavior.  But now imagine that we were to try to use punishment to generate the behavior.  Suppose that we were to electrically shock the pigeon whenever it spent more than, say, a minute without turning.  Would the shocks cause the pigeon to turn?  No–at least not efficiently.

Instead of turning, as we want it to, the pigeon would continue to do whatever is natural to it, moving in whatever direction it feels an impulse to move in.  When shocks come in, it would simply try to avoid and escape from them.  It would tense up, flinch, flail around, flee, whatever it can do.  Importantly, it wouldn’t have anything to send it towards the desired behavior, and build a specific appetite for that behavior.  Punishment doesn’t create appetite; it creates fear.  Fear of doing something other than the desired behavior does not imply appetite to do the desired behavior.

Skinner was adamant in extending this insight to the human case.  Punishment–the imposition of painful consequences–cannot efficiently get a person to engage in a wanted behavior.  It is not effective at creating the internal drive and motivation that the person needs in order to whole-heartedly perform the behavior.  To the extent that the person does perform the behavior in response to the threat of punishment, the behavior will be awkward, unnatural, artificial, done out of duress, rather than out of genuine desire. Instead of cooperating, the individual will try to come up with ways to avoid the punishment–whatever it needs to do to get to a place where it can do what it actually wants to do, without suffering negative consequences.

Imagine that you are my overweight child.  I’m trying to get you to exercise.  Sure, if I threaten you with a painful punishment for not exercising, you might go exercise.  But your heart isn’t going to be in the activity.  You’re going to go through the motions half-assed, doing the absolute bare minimum to keep me off your back.  Ultimately, if you really don’t want to exercise, you’re going to try to get around my imposition–by faking, hiding, creating distractions, buying time, pleading, whatever.  You’re trapped in a situation where none of your options are perceived to be good.  Rather than accept the lesser evil, you’re going to try to find a way out.

To motivate you to exercise, the answer is not to punish you for not exercising, but to try to get you to see and experience the benefits of exercising for yourself, to try to put you on a positive trajectory, where you exercise, you make progress in losing weight, you end up looking and feeling better, and that reward gives you motivation to continue to exercise regularly.  If that’s not possible, then the answer is to provide you with other rewards that register in your value system–money, free time, whatever.  When people engage in an activity, and make progress towards their goals and values–whether related to the activity, or not–the progress becomes a source of strength, momentum, optimism, hope.  It sows the seeds for further progress.

Skinner’s observation here is particularly relevant to the debate on how best to stimulate a depressed economy–whether to use expansive fiscal policy or expansive monetary policy, a debate that I’m going to elaborate on in a subsequent piece.  Expansive fiscal policy is a motivating, reward-oriented stimulus–it motivates investors and corporations to invest in the real economy by directly creating demand and the opportunity for profit. Expansive monetary policy–to include the imposition of negative real and especially negative nominal interest rates–is a repressive, punishment-oriented, stimulus.  It tries to motivate investors and corporations to invest in the real economy by taking away their wealth if they don’t.

Do investors and corporations acquiesce to the punishment?  No, they try to find ways around it–recycling capital through buybacks and acquisitions, levering up safe assets, reaching for yield on the risk curve, and engaging in other economically-dubious behaviors designed to allow them to generate a return without requiring them to do what they don’t want to do–tie up new money in an environment that they don’t have confidence in. Reasonable people can disagree on the extent to which the repressive policies that provoke these behaviors are financially destabilizing, but it’s becoming more and more clear that they aren’t effective at achieving their policy goals.  They don’t work.  Skinner most definitely would have recommended against them, at least in scenarios where a powerful, reward-oriented stimulus–e.g., expansive fiscal policy–was available.

Another important observation that Skinner made, this one particularly relevant to human beings, pertains to relationship between operant conditioning and rules.  Rules are ways that we efficiently codify behavioral lessons into language, to allow for easy transmission to others.  If I’m trying to teach you how to do something, I might give you a rule for how to do it.  You will then follow the rule–put it into practice.  Crucially, positive consequences will need to follow from your implementation of the rule–you will need to see the rule work in your own practice.  Without that reinforcement, continued adherence to the rule will become increasingly difficult.

As human beings, there’s nothing that we hate more than rules imposed on us that we don’t understand, and that we’ve never seen work.  Do X, don’t do Y, but do Z, but not if you’ve already done Q–and so on.  We might be able to gather the strength to follow through on these complex instructions, but unless we start seeing benefits, results, we’re not going to be able to maintain our adherence.

Our aversion to following rules that have not yet been operantly conditioned, i.e., tied in our minds to beneficial consequences, is the reason why we often prefer to shoot from the hip when doing things, as opposed to doing them by executing externally-provided instructions.  Take the example of a child that receives a new toy for Christmas that requires assembly.  The last thing the child will want to do is bust out the users manual, go to page one, and execute the complicated assembly instructions.  To the contrary, the child is going to want to try to put the toy together on her own–“no, let me do it!”–without the external burden of rules.  And children aren’t unique in that respect–we adults are the same way.  We prefer to come to solutions not by obediently carrying out other people’s orders, but by engaging in our own curious experimentation, allowing the observable consequences of our maneuvers–this worked, try it over there, that didn’t work, try something else–to naturally guide us to the right answers.

One of the reasons why investing is fun, in my opinion, is that you don’t need to follow any rules to do it.  You can wing it, go in and buy whatever you like, based on whatever your gut tells you to do–and still do well, sometimes just as well as seasoned professionals. This facet of the activity makes it uniquely enjoyable and entertaining, in contrast with activities where success requires tedious adherence to externally-imposed rules and constraints.

A final important observation that Skinner made, this one only relevant to humans and certain higher mammals, pertains to language and thought.  Skinner viewed language and thought as behaviors that are formed, in part, through conditioning–both Pavlovian and Operant.  From a Pavlovian perspective, linguistic connections between words and meanings are formed through exposure to repeated associations.  From an operant perspective, what we think and say is followed by consequences.  Those consequences condition what we think and say going forward.

How does an infant baby connect the oral sound “Daddy” to the man who just walked into the room?  He connects the two because Mommy says those words whenever Daddy walks in.  How does he learn to say them himself?  The answer, at least in part, is through a process of reinforcement.  Whenever he says “Da da” in response to seeing Daddy, everyone in the room turns their attention to him and expresses endearment and approval–“Oh, that’s so cute, Billy!  Say it again… say it again for Daddy!”  Though barely out of the womb, the organism is already able to “select” beneficial behavior from among different possibilities so as to perform it more frequently.

We might think that the influence of operant conditioning on our thinking and our speaking–or, to use Skinner’s preferred term for these activities, on our “verbal behavior”–ends in youth. It most certainly does not.  The influence continues throughout our lives, shaping us in subtle ways that we often fail to notice.  The emotional reactions that occur inside us, when we think and say things, and that occur outside us, in the form of the approval and disapproval of other members of our verbal communities, have a strong influence on where our thought processes and the statements that express them end up going.

Unfortunately, the internal and external contingencies that shape how we think and speak often aren’t truth-oriented.  They’re often oriented towards other values–the building and maintaining of positive relationships, the securing of desired resources, the demonstration of status, the achievement of resolution, and so on.  In many contexts, this lack of truth-orientation isn’t a problem, because there aren’t actual tangible harms associated with thinking and saying things that aren’t true.  “Do I look fat in this dress?” — “No, you look great honey, <cough>, <cough>.”  The world obviously isn’t going to end if a husband says that to his wife.

But in the arena of finance–at least the part of the arena behind the curtain, where actual financial decisions are made–the fact that our thinking and speaking can be shaped by factors unrelated to truth, or worse, factors opposed to truth, is a huge problem.  It’s a huge problem because there are actual, tangible consequences to being wrong.

Given that problem, we need to be vigilant about truth when making investment decisions. We need to routinely check to ensure that we’re thinking what we’re thinking, and saying what we’re saying, because we genuinely believe it to be true, or likely to be true, not because we’ve been conditioned to think it or say it by the effects of various hidden reinforcers.  We want our thoughts and statements to represent an honest description of reality, as we see it, and not devolve into ulterior mechanisms through which we to try to look and sound a part, or earn status and credibility, or win approval and admiration, or acquire power in organization, or make peace out of conflict, or secure the satisfaction of “being right”, or crush enemies and opponents, or smooth over past mistakes, or relish in the pride of having discovered something important, or preserve a sacred idea or worldview, and so on.  These hidden contingencies, to the extent that they are allowed to creep into the financial decision-making process and shape our verbal behaviors, can be costly.

Posted in Uncategorized | Comments Off on B.F. Skinner and Operant Conditioning: A Primer for Traders, Investors, and Economic Policymakers

Beer before Steel: Ranking 30 Industries by Fundamental Equity Performance, 1933 to 2015

steelbfbeerFrom January of 1933 through July of this year, beer companies produced a real, inflation-adjusted total return of 10% per year. Steel companies, in contrast, produced a return of 5%.  This difference in performance, spread out over 82 years, is enormous–the difference between a $1,000 investment that turns into $26,000,000 in real terms, and a $1,000 investment that turns into $57,000.

Given the extreme difference in past performance, we might think that it would be a good idea to overweight beer stocks in our portfolios and underweight steel stocks.  Whether it would actually be a good idea would depend on the underlying reason for the performance difference. Two very different reasons are possible:

(1) Differences in Return on Investment (ROI): Beer companies might be better businesses than steel companies, with higher ROIs.  Wealth invested and reinvested in them might grow faster over time, and might be impaired and destroyed less frequently.

(2) Change in Valuation: Beer companies might have been cheap in 1933, and might be expensive in 2015.  Steel companies, in contrast, might have been expensive in 1933, and might be cheap in 2015.

If the reason for the historical performance difference is (1) Difference in ROI, and if we expect the difference to persist into the future, then we obviously want to overweight beer companies and underweight steel companies–assuming, of course, that they trade near the same valuations.  But if the reason is (2) Change in Valuation, then we want to do the opposite.

The distinction between (1) and (2) speaks to an important challenge in investing.  Asset returns tend to mean-revert.  We therefore want to own assets that have underperformed, all else equal.  Assets that have underperformed have more “room to run”, and will tend to generate stronger subsequent returns than favored assets that have already had their day. But, in seeking out assets that have underperformed, we need to distinguish between underperformance that is likely to continue into the future, and underperformance that is likely to reverse, i.e., revert to the mean.  That distinction is not always an easy distinction to make.  Making it requires distinguishing, in part, between (1) underperformance that’s driven by poor ROI–structural lack of profitability in the underlying business or industry, and (2) underperformance that’s driven by negative sentiment and the associated imposition of a low valuation.  The latter is likely to be followed by a reversion to the mean; the former is not.

In this piece, I’m going to share charts of the fundamental equity performances of different U.S. industries, starting in January 1933 and ending in July of 2015.  I put the charts together earlier today in an effort to ascertain the extent to which differences in the historical performances of different industries have been driven by factors that are structural to the industries themselves, rather than cyclical coincidences associated with the choice of starting and ending dates–the possibility, for example, that beer stocks were in a bear market in 1933, with severely depressed valuation, and are now in a bull market, with elevated valuation, where the change in valuation, and not any underlying strength in beer-making as a business, explains the strong performance.

The charts are built using data from the publically-available CRSP library of Dr. Kenneth French.  The only variables available back to that date are price and dividend–but they are all that are needed to do the analysis. Dividends are the original, true equity fundamental.

The benefit to using dividends as a fundamental is that they are concrete and unambiguous.  “What was actually paid out?” is a much easier question to accurately answer than the question “What was actually earned?” or “What is the book actually worth?”  There are no accounting differences across different industries and different periods of history that we have to work through to get to an answer.  The disadvantage to using dividends is that dividend payout ratios have fallen over time.  A greater portion of current corporate cash flow is recycled into the business than in the past, chiefly in the form of share buybacks and acquisitions that show up in increased per share growth.  For this reason, when we approximate growth using dividend growth, we end up underestimating the true growth of recent periods.  But that’s not a problem.  The underestimation will hit all industries, preserving the potential for comparison between them.  If we want a fully accurate picture of the fundamental return, we can get one by mentally bumping up the annualized numbers by around 100 basis points or so.

The first task in the project is to find a way to takes changes in valuation out of the return picture.  We do that by building a total return index whose growth is limited to the two fundamental components of total return–growth in fundamentals (in this case, dividends–we use the 5 year smoothed average of monthly ttm dividends), and growth from reinvested dividend income.

We start by setting the index at 1.000 in January of 1933.  The smoothed dividends might have grown by 0.5% from January of 1933 to February of 1933, and the dividend yield for the month might have been 0.1%.  If that was the case, we would increase the index from January to February by 0.6%–the sum of the two.  The index entry for February would then be 1 * 1.006% = 1.006.  We calculate the index value for each month out to July of 2015 in this way, summing the growth contribution and the reinvested dividend income contribution together, and growing the index by the combined amount.  What we end up with is an index that has only fundamental total return in it–return due to growth and dividend income.  Any contribution that a change in valuation from start to finish might have made will end up removed.  (Of course, contributions from interim changes in valuation, which affect the rate of return at which dividends are reinvested, will not be removed.  Removing them requires making a judgement about “fair value”, so as to reinvest the dividends at that value.  That’s a difficult judgment to make across different industries and time periods when you only have dividend yields to work with as a valuation metric.  So we reinvest at market prices).

Unfortunately, we face a potentially confounding variable in the cyclicality of dividends, a cyclicality that smoothing cannot fully eliminate.  If smoothed dividends were at a cyclical trough in 1933, and are at a cyclical peak now, our chart will show strong fundamental growth.  But that growth will not be the kind of growth we’re looking for, growth indicative of a structurally superior ROI.  It will instead be an artifact of the place in the profit cycle of the industry in question that 1933 and 2015 happened to coincide with.

There’s no good way to remove the influence of this potential confounder.  The best we can do is to make an effort to assess the performance not based on one chosen starting point, but based on many.  So, even though the return for the period is quoted as a single number for a single period, 1933 to 2015, it’s a good idea to visually look at how the index grew between different points inside that range.  How did it grow 1945 to 1970?  From 1977 to 1990?  From 2000 to 2010?  If the strong performance of the industry in question is the result of a structurally elevated ROI–sustained high profitability in the underlying business–then we should see something resembling consistently strong performance across most or all dates.

Not to spoil the show, but we’re actually going to see that in industries like liquor and tobacco, which we suspect to be superior businesses with structurally higher ROIs.  In junky industries like steel and mining, however, what we’re going to see is crash and boom, crash and boom.  Periods of strong growth in those industries only seem to emerge from the rubble of large prior losses, leaving long-term shareholders who stick around for both with a subpar net gain.

The following legend clarifies the definition of each industry term.  There are 30 in total.


To the charts.  The following slideshow ranks each industry by fundamental return, starting with #30, and ending with #1.  All charts and numbers are real, inflation-adjusted to July of 2015.  Note that you can hit pause, and then move from slide to slide at your own pace:

The following table shows the industries and real fundamental annual total returns from 1933 to 2015 together, ranked from low to high:


To be clear, the charts and tables tell us which industries performed well from 1933 to 2015. They don’t tell us which industries will perform well from 2015 into the future.  Beer might have been a consistently great businesses over the last century, steel might have been a consistently weak business.  But it doesn’t follow that the businesses are going to exhibit the same fundamental performances over the next century–conditions can change in relevant ways.  And if we believe that the businesses are going to generate the same fundamental performances that they generated in the past, it doesn’t necessarily follow that we should overweight or underweight them.  The relative weighting that we assign them should depend on the extent to which their valuations already reflect the expected performance divergence.

Posted in Uncategorized | Comments Off on Beer before Steel: Ranking 30 Industries by Fundamental Equity Performance, 1933 to 2015

Thoughts on Negative Interest Rates

The big surprise from Thursday’s Fed announcement was not the decision to hold interest rates at zero, which most Fed observers expected, but the revelation that an unidentified FOMC member–probably Narayana Kocherlakota, but possibly another dove–is now advocating the use of negative nominal interest rates as a policy tool:typo2

What follows is a simplified explanation of how a policy of negative interest rates would work.  The central bank would begin by enacting a large scale program of “quantitative easing” that would entail the creation of new money and the use that new money to buy assets from the private sector.  The new money would end up on deposit at banks, where it would represent an excess cash reserve held physically in vaults or electronically on deposit at the central bank.  The central bank would continue the program until the quantity of excess reserves in the banking system was very high.  Indeed, the higher the quantity of those reserves, the more powerful–or rather, the more punitive–a policy of negative interest rates would end up being.

After saturating the system with excess reserves, the central bank would require individual banks that hold excess reserves to pay interest on them–effectively, a tax.  Individual banks would then have three choices:

(1) Eat the associated expense, i.e., take it as a hit to profit,

(2) Pass the associated expense on to depositors, creating the equivalent of a negative interest rate on customer deposits, or

(3) Increase lending (or purchase assets), so that the excess reserves cease to be “excess”, but instead become “required” by the increased quantity of liabilities that the bank will bear.

This third option is important and confusing, so I’m going to spend some time elaborating on it.  Recall that the issuance of loans and the purchase of assets by banks create new deposits, which are bank liabilities.  Required reserves are calculated as a percentage of (certain types of) those liabilities.  Importantly, required reserves do not incur interest under the policy (at least as the policy is currently being implemented in Europe), and so increases in lending, which increase the quantity of reserves that get classified as “required” as opposed to “excess”, represent a way to avoid the cost, both for banks individually, and for the banking system in aggregate.

The following schematic illustrates with an example:  bf1

We have an individual bank–we’ll call it American Bank.  This bank begins with $110 in assets, $50 of which are cash reserves, and $100 in deposit liabilities, all of which are subject to the reserve requirement.  If we assume that the reserve requirement is 10% of deposit liabilities, then the bank will be required to hold $100 * 10% = $10 in reserves. But, in this case, American bank is holding $50 in reserves–$40 more than it is required to hold.  In a negative interest rate regime, it will have to pay interest, or tax, on that excess.  So if the annual interest rate is -2%, it will have to pay $40 * .02 = 80 cents (8% of its $10 in capital, so no small amount!).  The payment will go to the central bank, or to the treasury, or whoever.

Now, let’s suppose that American Bank issues $400 in a new loan.  The way it would actually make this loan would be to simply create (from essentially nothing) a new deposit account for the borrower, with a $400 balance in it, that the borrower can draw from on demand.  Let’s suppose that the borrower keeps the $400 in that same account, i.e., doesn’t withdraw it as physical cash or move it to another bank by transferring it or writing a check on it that gets cashed elsewhere.  It follows that the bank will not have to come up with any actual money to fund the loan.  The money that is funding the loan is the money that the loan created, which has stayed inside the bank.  Only when that money leaves the bank does the bank have to “come up with it.”

After the loan is made, American Bank will have $510 in assets (the $110 in previous assets plus the $400 in new loans, that the borrower owes it), and $500 in liabilities (the $100 in previous deposits plus the new $400 that the borrower is holding on deposit with it).  The required reserves will be $500 * 10% = $50, which is exactly the amount of cash that it has on hand.  So it’s excess reserves will be $0 and it will not have to pay any interest or tax. Problem solved.

Now, suppose that the borrower decides to move the $400 that it has on deposit at American Bank to some other bank. But American Bank only has $50 in actual cash reserves to move.  And those reserves are needed to meet the $50 in required reserves, so they can’t be moved.  How, then, will the bank satisfy the borrower’s demand to send $400 to another bank?  Easy–it will simply go into the Fed Funds market, borrow $400 from a bank that has excess funds to lend, and transfer the funds to the bank that the borrower wants them transferred to.  Its $400 liability to the borrower will then disappear, to be replaced by a $400 liability to the bank from which it borrowed.

Of course, nothing will actually physically move in this process.  The transfer will occur electronically, at the Fed, through the adjustment of the deposit balances of the involved banks–essentially, a spreadsheet operation.  The bank that is lending $400 to American Bank will see its deposit balance at the Fed fall by $400, American Bank will see no change to its deposit balance, and the bank that is receiving the $400, which is the bank that the borrower is moving the money to, will see its deposit balance increase by $400.

But what if other banks lack sufficient funds, over and above the amount that they themselves have to hold to meet reserve requirements, to lend to American Bank?  That won’t happen.  The Fed uses asset purchases and asset sales to manipulate the quantity of funds in the banking system, over and above those that are needed to meet reserve requirements, so that there are always sufficient excess funds available to be lent, at the Fed Funds rate, the Fed’s target short-term rate.  When the Fed wants that rate to be higher, it uses asset sales, which take funds (money) out of the system and put assets (e.g., bonds) in, to make the supply of excess funds available to be lent tighter; if it wants that rate to be lower, it uses asset purchases, which take assets (e.g., bonds) out of the system and put new funds (money) in, to make the supply of excess funds available to be lent more plentiful.  Importantly, the targeted variable in the Fed Funds market is not the supply of excess funds available to be lent, but the price.  The Fed, through its operations, effectively guarantees that there will always be a sufficient supply of excess funds available to be lent at the target price–though it may be an expensive price, if the Fed wants less lending to take place.

But what if other banks refuse to lend to American Bank, even though there are excess funds are available to be lent?  Again, not a problem.  If the bank can prove that it is worthy of a loan, then it can borrow directly from the Fed, through the discount window, at a price slightly higher than the price targeted in the Fed Funds market.

To summarize, if the borrower moves his deposit to another bank, the picture changes to look like this:


As you can see, nothing really changes when the borrower moves the money except the composition of the bank’s deposit liabilities.  Previously, they were liabilities to an individual, now they are liabilities to other banks, or to the Fed.  They are still subject to the reserve requirement, and so the bank’s reserve excess remains zero.

It’s important to understand that when banks implement this third option, the cash reserves that were “excess” continue to exist as reserves–as physical cash held in storage or as balances held on deposit at the central bank.  Only the central bank can (legally) create or destroy them, which it does through its open market operations–its purchase and sale of assets from the private sector.  Because the reserves still exist, they still have to be held by some person or entity.  Unless customers extract them and hold them physically as metal coins and paper bills, the banking system in aggregate–some bank somewhere–continues to hold them.

The point, however, is that they no longer get classified as excess reserves.  They become required reserves, required by the larger quantity of deposit liabilities that the banking system ends up bearing.  Because required reserves do not incur interest expense under the policy, an increase in aggregate bank lending, which will increase the quantity of reserves that get classified as “required” as opposed to “excess”, represents a potential way for banks to avoid the cost–both individually, and in aggregate.

Returning to the question of how banks would respond, there’s obviously a limit to the cost increase that they will be willing to absorb.  If the negative rate is “negative” enough, they will try to pass that increase on to their customers, charging interest to those who hold deposits with them and who cause them to have unneeded excess reserves in the first place.  Proponents of the policy believe that this outcome would stimulate the economy by increasing the velocity of money.  Bank deposits would become an item that everyone wants to get rid off, but that someone has to hold.  They would get tossed around like a hot potato, moving from individual to individual, in the form of increased spending, trading and investing.

Of course, spending, trading and investing aren’t the only ways to get rid of a bank deposit. A depositor can take physical delivery of the money, and put it into her own storage, outside of the banking system–a piggy bank, a mattress, a safe, wherever.  The assumption is that this type of maneuver would be inconvenient and therefore rarely used.  If the assumption were to be proven wrong, the next step would be to eliminate physical money altogether, so that all cash ends up trapped inside the banking system, with the owners forced to pay interest on it for as long as they choose to continue to hold it.

With respect to the third option, which is to increase lending, banks aren’t always able to increase their lending in the normal ways–they need credible borrowing demand from borrowers.  That demand isn’t guaranteed to be there.  To generate it, however, they can offer to pay borrowers to borrow–funding the payments with the income that they generate from charging their depositors interest.  The negative interest rate regime will then come full circle, in a perverse reversal of the normal banking arrangement–instead of borrowers paying depositors to borrow their money, with the banking system acting as an intermediary, depositors will be paying borrowers to borrow their money, with the banking system again acting as an intermediary.

In assessing the third option, we can’t forget the impact of regulatory capital ratios, which can quickly become limiting.  In our example, American Bank had $50 in cash reserves, which carry a risk-weighting of 0%, $10 in bonds which we assume are government bonds  that also carry a risk-weighting of 0%, and $50 in retail loans, which carry a risk-weighting of 100%.  Simplistically, the bank has $10 in capital, so the bank’s capital ratio would be $10 / (0% * $50 + 0% * $10 + 100% * $50) = $10 / $50 = 20%, well above the 8% Basel requirement.   After the new $400 loan, however, the bank’s capital ratio would fall to $10 / (0% * $50 + 0% * $10 + 100% * $450) = $10 / $450 = 2%, which is well below the 8% Basel requirement.  So American Bank would not actually be able to do what I just proposed.

Given limits to regulatory capital ratios, the only way for banks to use the third option to substantially increase required reserves and reduce interest expense would be to buy assets that don’t carry a risk-weighting–banks would either have to do that, or raise capital, which no bank is going to want to do.  It follows that the types of interest rates that would see the biggest relative drop under a regime of substantially negative interest rates would not be the risk-bearing interest rates that real economic participants pay, but the risk-free interest rates that governments and other risk-free borrowers pay.  Those securities would get gobbled up by the banking system, pushed to rates as negative as the negative rates on excess reserves.

Proponents of the policy assume that if banks choose to respond by increasing their lending to the private sector, that the increase will necessarily be stimulative to economic activity.  That may be true, but not necessarily.  It’s possible that banks could issue zero interest rate loans to highly creditworthy private sector borrowers who don’t want or need the money, and who have no plans to spend or invest it, but who agree to take and hold the loans in exchange for other perks–for example, a waiving of interest and fees on other deposits being held.  Such loans–even though they wouldn’t be doing anything economically–would increase the bank’s deposit liabilities and required reserves, and therefore decrease the portion of its reserves that get classified as “excess”, eliminating the associated interest expense.  Of course, these loans, even though safe, would carry regulatory risk, and so the ability of the banking system to engage in them would be limited by regulatory capital.

Earlier, we noted that it’s important for the central bank to inject excess reserves into the system through quantitative easing prior to implementing a policy of negative interest rates.  The reason is obvious.  On the assumption that lending stays constant, every excess reserve in the system is going to have to be held, and paid interest on, by some bank, and ultimately, by some depositor, the person who actually owns the money, and who is holding it on deposit at the bank.  Quantitative easing increases the quantity of bank deposits and excess reserves in the system.  It therefore increases the number of assets in the system that are directly subject to negative rates, and that incur the obligation to pay those rates.

To explain with an example, if the private sector’s asset portfolio consists of $10 in cash and $1000 in fixed income assets, and the Fed imposes a negative interest rate, that rate will only directly hit $10.  But if the Fed goes in and buys 100% of the fixed income assets, swapping them for newly issued money, such that the private sector’s asset portfolio shifts to $1010 of cash and $0 in fixed income assets, and if it then imposes a negative interest rate, that rate will directly hit all $1010–the private sector’s entire asset portfolio.  It will cause that much more pain, and will therefore have that much more of an effect.

Of course, this effect would come at a cost.  Psychologically, there’s a big difference between not making money, as inflation slowly erodes its value, and outright losing it–particularly meaningful amounts of it.  People tend to suffer much more at the latter, and are therefore likely to go to far greater lengths to avoid it.  The result of the policy, then, would not be an increase in what economies at the zero-lower-bound need–well-planned, productive, useful, job-creative investment–but rather panicky, rushed, impulsive financial speculation that leads to asset bubbles and the misallocation of capital, with detrimental long-term consequences on both output and well-being.

Worse, the policy is likely to be deflationary, not inflationary.  Like any tax, it destroys financial wealth–the financial wealth of the people that have to pay it.  That wealth is taken out of the system.  Granted, the wealth can be reintroduced into the system if the government that receives it resolves to take it and spend it.  But in the instances of unconventional monetary policy that have played out so far–quantitative easing globally and negative interest rates in Europe–that hasn’t happened.  Governments have pocketed the income from these programs, sending it into the financial “black hole” of deficit reduction.

Even if the wealth is reinjected into the system in the form of tax cuts or increased government spending elsewhere, we have to consider the behavioral effects on those that rely, at least in part, on returns on accumulated savings to fund their expenditures.  Those individuals–typically older people–represent a growing percentage of western society. Under conventional policy, they simply have to deal with low interest rates on their savings–tough, but manageable.  To require them to deal with negative interest rates–confiscation of a certain percentage of their savings as each year passes–would be a significant paradigm shift.  Their confidence in their ability to fund their futures–their future spending–would likely fall.  They would therefore spend less, not more, exacerbating the economic weakness.  Granted, the threat of punishment for holding risk-free assets might coax them into speculating  in the risky financial bubbles that will have formed–but then again, it might not.  If it does, they will suffer on the other end.

Hopefully at this point, the reader intuitively recognizes that imposing meaningfully negative interest rates on the population is a truly terrible idea.  If we’re only talking about a few basis points, a sort of “token” tiny negative rate that is put in place for optics, as has been done in Europe–fine, people will grow accustomed to it and eventually ignore it.  But a serious use of negative rates, that involves the imposition of levels meaningfully below zero–e.g., -2%, -3%, -4%, and so on–would be awful for the economy, and for people more generally.

The problem of how to stimulate a demand-deficient economy is fundamentally a behavioral problem.  It needs to be evaluated from a behavioral perspective.  We have to ask ourselves, what specific behavior do we want to encourage? We know the answer. We want corporations and the entrepreneurial class to invest in the production of useful, wanted things.  Their investment creates jobs, which produce incomes for working people, incomes that can then be used to purchase those useful, wanted things, completing a virtuous cycle in which everyone benefits and prospers.

The question is, if corporations and the entrepreneurial class aren’t doing enough of that, how do we get them to do more of it?  The answer, which I’m going to elaborate on in the next piece, is not by punishing them with a highly repressive monetary policy, a policy that goes so far as to confiscate their money unless they hand it off to someone else.  Rather, the answer is to put the economy in a condition that causes them to become confident that if they invest in the production of useful, wanted things, that they will receive the due reward, profit.  In an economy like ours, a more-or-less structurally sound economy that happens to suffer from deficient aggregate demand associated with legacy private sector debt and wealth inequality issues, the way to do that is with fiscal policy.

Posted in Uncategorized | Comments Off on Thoughts on Negative Interest Rates