B.F. Skinner and Operant Conditioning: A Primer for Traders, Investors, and Economic Policymakers

skinner4Markets and economies are agglomerations of interconnected human behaviors.  It’s a surprise, then, that in the fields of finance and economics, the work of history’s most famous behavioral psychologist, B.F. Skinner, is rarely mentioned. In this piece, I’m going to present an introduction to Skinner’s general theory of behavior, drawing attention to insights from his research that can be applied to trading, investing, and economic policymaking.  The current piece will serve as a primer for the next one, in which I’m going to discuss the insights with a greater practical emphasis.

If you’re like most, you come to this blog to read about finance and economics, not about psychology or philosophy, so you’re probably ready to close the window and surf on to something else.  But I would urge you to read on.  Skinner’s work was deep and profound–brimming with insights into the way reality and human beings work.  Anyone interested in finance and economics will benefit from being familiar with it.

Pavlovian Conditioning, Operant Conditioning and Selection by Consequence

In the early 1900s, Russian physiologist Ivan Pavlov conducted experiments on canine digestion.  He exposed restrained dogs to the scent of meat powder, and measured the extent to which they salivated in response to it.  In the course of these experiments, he stumbled upon a groundbreaking discovery: Dogs that had been put through experiments multiple times would salivate before any meat powder was presented, in response to the mere sight of lab assistants entering the room.

Pavlov hypothesized that repeated associations between “lab assistants” and “the smell of meat” had conditioned the dogs to respond to the former in the same way as the latter–by salivating.  To test this hypothesis, Pavlov set up another experiment.   He rang a bell for the dogs to hear, and then exposed them to the scent of meat powder.  He found that after repeated associations, the dogs would salivate in response to the mere sound of the bell, before any meat powder was presented.

Around the same time that Pavlov conducted his experiments on salivation in dogs, the American psychologist Edward Thorndike conducted experiments on learning in cats.  In these experiments, Thorndike trapped cats inside of “puzzle” boxes that could only be opened by pushing on various built-in levers.  After trapping the cats, he timed how long it took them to push on the levers and escape.  When they escaped, he rewarded them with food and put them back inside the boxes to escape again. He noticed that cats that had successfully escaped took sequentially less time to escape on each subsequent trial.  He concluded that the cats were “learning” from the trials.

In the late 1930s, Harvard psychologist B.F. Skinner synthesized the discoveries of Pavlov, Thorndike, and others into a coherent system, called Behaviorism.  Behaviorism sought to explain the behaviors of organisms, to include the behaviors of human beings, purely mechanistically, in terms of causal interactions with the environment, rather than in terms of nebulous, unscientific concepts inherited from religious tradition: “soul”, “spirit”, “free-will”, etc.

Skinner distinguished between two types of conditioning:

Classical Conditioning: The kind of conditioning that Pavlov discovered, which involves the repeated association of two stimuli–an unconditioned stimulus (the smell of meat) and a conditioned stimulus (the sound of a bell)–in a way that causes the conditioned stimulus (the sound of a bell) to evoke the same response (salivation) as the unconditioned stimulus (the smell of meat).  The unconditioned stimulus (the smell of meat) is called “unconditioned” because its connection to the response (salivation) is hard-wired into the organism.  The conditioned stimulus (the sound of a bell) is called “conditioned” because its connection to the response (salivation) is not hard-wired, but rather is formed through the “conditioning” process, i.e., the process of changing the organism through exposure.

Operant Conditioning: The kind of conditioning that Thorndike discovered, wherein the subsequent frequency of an organism’s behavior is increased or decreased by the consequences of that behavior.  When behavior is followed by positive outcomes (benefit, pleasure), the behavior goes on to occur more often; when behavior is followed by negative outcomes (harm, pain), the behavior goes on to occur less often, if at all.  Operant conditioning differs from Pavlovian conditioning in that it involves the learning of a voluntary behavior by the consequences of that behavior, rather than the triggering of an automatic, involuntary response by exposure to repeated associations.

Skinner is known in popular circles for the fascinating experiments that he conducted on the conditioning, experiments in which he used the technique to get animals to do all kinds of weird, unexpected things.  In the following clip, Skinner shares the result of one such experiment, an experiment in which he successfully taught pigeons to “read” English:

Skinner liked to explain operant conditioning in terms of the analogue of evolution.  Recall that in biological evolution, random imperfections in the reproductive process lead to infrequent mutations.  These mutations typically add zero or negative value to the organism’s fitness.  But every so often, purely by chance, the mutations end up conferring advantages that aid in survival and reproduction.  Organisms endowed with the mutations go on to survive and reproduce more frequently than their counterparts, leaving more copies of the mutations in subsequent generations, until the mutations become endemic to the entire reproductive population. That is how the adapted species is formed. We human beings, with these complex brains and bodies, are direct descendents of those organisms–human and pre-human–that were “lucky” enough to be endowed with the “best” mutations of the group.

Biological evolution involves what Skinner brilliantly called “selection by consequence.” Nature continually “tries out” random possible forms.  When the forms bring good consequences–i.e., consequences that lead to the survival and successful self-copying of the forms–it holds on to them.  When they bring bad consequences–i.e., consequences that lead to the death of the forms–it discards them.  Through this process of trial-and-error, it extracts order from chaos.  There is no other way, according to Skinner, for nature to create complex, self-preserving systems–biological or otherwise.  It has no innate “intelligence” from which to design them, no ability to foresee survivable designs beforehand based on a thought process.

Skinner viewed animal organisms, to include human beings, as a microcosm for the same evolutionary process–“selection by consequence.”  An animal organism, according to Skinner, is a highly complex behavior selection machine.  As it moves through its environment, it is exposed to different types of behaviors–some that it tries out on its own, randomly, or in response to causal stimuli, and some that it observes others engage in.  When the behaviors produce positive consequences (benefit, pleasure, etc.), its brain and psychology are modified in ways that cause it to engage in them more often.  When the behaviors produce negative consequences (harm, pain, etc.), its brain and psychology are modified in ways that cause it to refrain from them in the future.  Through this process, the process of operant conditioning, the organism “learns” how to interact optimally with the contingencies of its environment.

According to Skinner, brains with the capacity for operant conditioning are themselves consequences of evolution.  Environmental conditions are always changing, and therefore the specific environment that an organism will face cannot be fully known beforehand.  For this reason, Nature evolved brains that have the capacity to form optimal behavioral tendencies based on environmental feedback, rather than brains that have been permanently locked into a rigid set of behaviors from the get-go.

Contrary to popular caricature, Skinner did not think that animal organisms–human or otherwise–were “blank slates.”  He acknowledged that they have certain unchangeable, hard-wired biological traits, put in place by natural selection.  His point was simply that one of those traits, a hugely important one, is the tendency for certain of behaviors of organisms–specifically, “voluntary” behaviors, those that arise out of complex information processing in higher regions of the brain–to be “learned” by operant conditioning, by the consequences that reality imposes.

The Mechanics of Operant Conditioning: Reinforcement and Punishment

Skinner categorized the feedback processes that shape behaviors into two general types: reinforcement and punishment.  Reinforcement occurs when a good–i.e., pleasurable–consequence follows a behavior, causing the behavior to become more frequent–or, in non-Skinnerian cognitive terms, causing the organism to experience an increased desire to do the behavior again.  Punishment occurs when a bad–i.e., painful–consequence follows a behavior, causing the behavior to become less frequent–or, in non-Skinnerian terms, causing the organism to experience an aversion to doing the behavior again.  

In the following clip, Skinner demonstrates the technique of operant conditioning, using it to get a live pigeon to turn 360 degrees:

Skinner starts by putting the pigeon near a machine that dispenses food on a push-button command.  He then waits for the pigeon to turn slightly to its left.  In terms of the analogue of biological evolution, this period of waiting is analogous to the period wherein Nature waits for the reproductive process to produce a mutation that it can then “select by consequence.”  The pigeon’s turn is not something that Skinner can force out of the pigeon–it’s a behavior that has to randomly emerge, as the pigeon tries out different things in its environment.

When Skinner sees the turn happen, he quickly pushes the button and dispenses the reward, food.  He then waits for the pigeon to turn again–which the pigeon does, because the pigeon starts to catch on.  But this time, before dispensing the food, he waits for the pigeon to turn a bit farther.  Each time, he waits for the pigeon to turn farther and farther before dispensing the food, until the pigeon has turned a full 360 degrees.  At that point, the task is complete.  The pigeon keeps fully turning, and he keeps feeding it after it does so.

What is actually happening in the experiment?  Answer: the pigeon’s brain and psychology are somehow being modified to associate turning 360 degrees with food, such that whenever the pigeon is hungry and wants food, it turns 360 degrees.  If we want, we can describe the modification as a modification in a complex neural system, a physical brain that gets rewired to send specific motor signals–“turn left, all the way around”–in response to biological signals of hunger.  We can also describe the process cognitively, as involving an acquired feeling that arises in the pigeon–that when the pigeon gets hungry, it feels an urge or impetus to turn 360 degrees to the left, either automatically, or because it puts two and two together in a thinking process that connects the idea of turning with the idea of receiving food, which it wants.  Skinner famously preferred the former, the non-cognitive description, arguing that cognitive descriptions are unobservable and therefore useless to a science of behavior.  But cognitive descriptions work fine in the current context.

To keep the conditioned behavior in place, the conditioner needs to maintain the reinforcement.  If the reinforcement stops–if the pigeon turns, and nothing happens, and then turns again, and nothing happens again, and so on–the behavior will eventually disappear.  This phenomenon is called “extinction.”  It’s a phenomenon that Pavlov also observed: if the association between the bell and the arrival of meat powder is not maintained over time, the dogs will stop salivating in response to the bell.  

Importantly, the capacity for conditioned behavior to go extinct in the absence of reinforcement is itself a biological adaption.  Learning to behave optimally isn’t just about learning to do certain things, it’s also about unlearning them when they stop working.  An organism that is unable to unlearn behaviors that have stopped working will waste large amounts of time and energy doing useless things, and will end up falling behind in the evolutionary race. 

Skinner noted that effective reinforcement needs to be clearly connectable to the behavior, preferably close to it in time.  If food appears 200 days after the pigeon turns, the pigeon is not going to develop a tendency to turn.  The connection between turning and receiving food is not going to get appropriately wired into the pigeon’s brain.  At the same time, the reward doesn’t have to be delivered after every successful instance of the behavior.  A “variable” schedule of reinforcement can be imposed, in which the reward is only delivered after a certain number of successful instances, provided that the number is not too high.

Skinner noted that when an organism observes a consequence in response to a behavior, it “generalizes.”  It experiments with similar behaviors, to see if they will produce the same consequence.  For example, the pigeon who received food by pecking a disk in the first video will start trying to peck similar objects, in the hopes that pecking them will produce a similar release of food.  Eventually, after sufficient modification by the environment, the organism learns to “discriminate.”  It learns that the behavior produces a consequence in one situation, but not in another.

Extension to Human Beings: The Example of Gambling

The natural inclination is to dismiss Skinner’s discoveries as only being applicable to the functioning of “lesser” organisms–rats, pigeons, dogs, and so on–and not applicable to the functioning of human beings.  But the human brain, Skinner argued, is just a more computationally advanced version of the brains of these other types of organisms.  The human brain comes from the same common place that they come from, having been progressively designed by the same designer, natural selection.  We should therefore expect the same kind of learning process to be present in it, albeit in a more complex, involved form. Skinner demonstrated that it was present, in experiments on both human children and human adults.

gamblingPsychologists have long since struggled with the question, why do human beings gamble? Gambling is an obviously irrational behavior–an individual takes on risk in exchange for an expected return that is less than zero.  Why would anyone do that? Marx famously thought that people, particularly the masses, do it to escape from the stresses of industrialization.  Freud famously thought that people–at least certain men, the clients he diagnosed–do it to unconsciously punish themselves for unconscious guilt associated over the oedipal complex–the sexual attraction that they unconsciously feel–or at least unconsciously felt, as children–for their mothers.

Contra Marx and Freud, Skinner gave the first intellectually respectable psychological answer to the question.  Human beings gamble, and enjoy gambling, even though the activity is pointless and irrational, because they’ve been subjected to a specific schedule of reinforcement–a “variable” schedule, where the reward is not provided every time, but only every so often, leaving just enough “connection” between the behavior and the reward to forge a link between the two in the brain and psychology of the subject.

Skinner showed that in order for the pigeon to maintain the pecking and turning behaviors, it doesn’t need to get the reward every time that those behaviors occur.  It just needs to get the reward every so often–that will be enough to keep the pigeon engaging in the behaviors on an ongoing basis.  Skinner noted that the same was true about gamblers. Gamblers don’t need to win every time, they just need to win every so often.  A grandiose victory–a jackpot–that occurs every so often is more than enough to imbue them with inspiring thoughts of winning, and an associated appetite to get in and play.  It is the business of a casino to optimize the schedule at which gamblers win, so that they win just enough to sense that victory is within their reach, just enough to feel the associated thrill and excitement each time they turn the lever.  An efficient casino operation will not afford gamblers any more victories than that–certainly not enough for them to actually make money on a net basis, which would represent the casino’s net loss.

The process through which the gambler is conditioned to gamble is obviously not as simple as the process through which the pigeon is conditioned to peck.  For the human being, there is the complex and vivid mediation of thought, memory, emotion, impulse, and the internal struggle that arises when these mental states push on each other in conflicting ways.  But the fact remains that the reinforcement of winning is ultimately what gives rise to the appetite to play, the psychological pull to engage in the behavior again.  If you were to completely take that reinforcement away, the appetite and pull would eventually disappear, go extinct–at least in normal, mentally healthy human beings.  If casinos were designed so that no one ever won anything, no one ever experienced the thrill and excitement of winning, then no one would ever bother with the activity.  Casinos would not have any patrons.

Skinner’s insights here have a clear application to the understanding of stock market behavior, an application that we’re going to examine more closely in the next piece.  To get a sense of the application, ask yourself: what, more than anything else, gives investors the confidence and appetite to invest in risky asset classes such as equities?  Answer: the experience of actually investing in them, and being rewarded for it, consistently.  Sure, you can tell people the many reasons why they should invest in equities–that’s all wonderful.  But to them, it’s just verbiage, someone’s personal opinion.  In itself, it’s not inspiring. What’s inspiring is the actual experiencing of taking the risk, and winning–making money, on a consistent basis.  Then, you come to trust the process, believe in it, viscerally.  You develop an appetite for more.  As many of us know from our own mistakes, the experience can be quite dangerous–enough to make clueless novices think they are seasoned experts. 

On the flip side, what, more than anything else, causes investors to become averse to investing in risky asset classes such as equities?  Again, the experience of actually investing in them, and getting badly hurt.  A dark cloud of danger and guilt will then get attached to the activity.  The investor won’t want to even think about going back to it for another try–at least not until sufficient time has passed for extinction to occur.  This is operant conditioning in practice.  

The concepts of Classical Conditioning, Operant Conditioning, Extinction, Generalization, Discrimination, and many other concepts that Skinner researched have a role in producing the various trends and patterns that we see play out in markets.  Understanding these processes won’t give us a crystal ball to use in predicting the market’s future, but it can help us better understand, and more quickly respond to, some of the changes that happen as economic and market cycles play out.

Operant Conditioning: Observations Relevant to Traders, Investors, and Economic Policymakers

In this final section, I’m going to go over some unique observations that Skinner made in the course of his research that are relevant to traders, investors, and economic policymakers.  The observation for which Skinner is probably most famous is the observation that reinforcement is a more effective technique for producing a desired behavior than punishment.  We want the pigeon turn.  We saw that giving it a reward–food–works marvelously to produce that behavior.  But now imagine that we were to try to use punishment to generate the behavior.  Suppose that we were to electrically shock the pigeon whenever it spent more than, say, a minute without turning.  Would the shocks cause the pigeon to turn?  No–at least not efficiently.

Instead of turning, as we want it to, the pigeon would continue to do whatever is natural to it, moving in whatever direction it feels an impulse to move in.  When shocks come in, it would simply try to avoid and escape from them.  It would tense up, flinch, flail around, flee, whatever it can do.  Importantly, it wouldn’t have anything to send it towards the desired behavior, and build a specific appetite for that behavior.  Punishment doesn’t create appetite; it creates fear.  Fear of doing something other than the desired behavior does not imply appetite to do the desired behavior.

Skinner was adamant in extending this insight to the human case.  Punishment–the imposition of painful consequences–cannot efficiently get a person to engage in a wanted behavior.  It is not effective at creating the internal drive and motivation that the person needs in order to whole-heartedly perform the behavior.  To the extent that the person does perform the behavior in response to the threat of punishment, the behavior will be awkward, unnatural, artificial, done out of duress, rather than out of genuine desire. Instead of cooperating, the individual will try to come up with ways to avoid the punishment–whatever it needs to do to get to a place where it can do what it actually wants to do, without suffering negative consequences.

Imagine that you are my overweight child.  I’m trying to get you to exercise.  Sure, if I threaten you with a painful punishment for not exercising, you might go exercise.  But your heart isn’t going to be in the activity.  You’re going to go through the motions half-assed, doing the absolute bare minimum to keep me off your back.  Ultimately, if you really don’t want to exercise, you’re going to try to get around my imposition–by faking, hiding, creating distractions, buying time, pleading, whatever.  You’re trapped in a situation where none of your options are perceived to be good.  Rather than accept the lesser evil, you’re going to try to find a way out.

To motivate you to exercise, the answer is not to punish you for not exercising, but to try to get you to see and experience the benefits of exercising for yourself, to try to put you on a positive trajectory, where you exercise, you make progress in losing weight, you end up looking and feeling better, and that reward gives you motivation to continue to exercise regularly.  If that’s not possible, then the answer is to provide you with other rewards that register in your value system–money, free time, whatever.  When people engage in an activity, and make progress towards their goals and values–whether related to the activity, or not–the progress becomes a source of strength, momentum, optimism, hope.  It sows the seeds for further progress.

Skinner’s observation here is particularly relevant to the debate on how best to stimulate a depressed economy–whether to use expansive fiscal policy or expansive monetary policy, a debate that I’m going to elaborate on in a subsequent piece.  Expansive fiscal policy is a motivating, reward-oriented stimulus–it motivates investors and corporations to invest in the real economy by directly creating demand and the opportunity for profit. Expansive monetary policy–to include the imposition of negative real and especially negative nominal interest rates–is a repressive, punishment-oriented, stimulus.  It tries to motivate investors and corporations to invest in the real economy by taking away their wealth if they don’t.

Do investors and corporations acquiesce to the punishment?  No, they try to find ways around it–recycling capital through buybacks and acquisitions, levering up safe assets, reaching for yield on the risk curve, and engaging in other economically-dubious behaviors designed to allow them to generate a return without requiring them to do what they don’t want to do–tie up new money in an environment that they don’t have confidence in. Reasonable people can disagree on the extent to which the repressive policies that provoke these behaviors are financially destabilizing, but it’s becoming more and more clear that they aren’t effective at achieving their policy goals.  They don’t work.  Skinner most definitely would have recommended against them, at least in scenarios where a powerful, reward-oriented stimulus–e.g., expansive fiscal policy–was available.

Another important observation that Skinner made, this one particularly relevant to human beings, pertains to relationship between operant conditioning and rules.  Rules are ways that we efficiently codify behavioral lessons into language, to allow for easy transmission to others.  If I’m trying to teach you how to do something, I might give you a rule for how to do it.  You will then follow the rule–put it into practice.  Crucially, positive consequences will need to follow from your implementation of the rule–you will need to see the rule work in your own practice.  Without that reinforcement, continued adherence to the rule will become increasingly difficult.

As human beings, there’s nothing that we hate more than rules imposed on us that we don’t understand, and that we’ve never seen work.  Do X, don’t do Y, but do Z, but not if you’ve already done Q–and so on.  We might be able to gather the strength to follow through on these complex instructions, but unless we start seeing benefits, results, we’re not going to be able to maintain our adherence.

Our aversion to following rules that have not yet been operantly conditioned, i.e., tied in our minds to beneficial consequences, is the reason why we often prefer to shoot from the hip when doing things, as opposed to doing them by executing externally-provided instructions.  Take the example of a child that receives a new toy for Christmas that requires assembly.  The last thing the child will want to do is bust out the users manual, go to page one, and execute the complicated assembly instructions.  To the contrary, the child is going to want to try to put the toy together on her own–“no, let me do it!”–without the external burden of rules.  And children aren’t unique in that respect–we adults are the same way.  We prefer to come to solutions not by obediently carrying out other people’s orders, but by engaging in our own curious experimentation, allowing the observable consequences of our maneuvers–this worked, try it over there, that didn’t work, try something else–to naturally guide us to the right answers.

One of the reasons why investing is fun, in my opinion, is that you don’t need to follow any rules to do it.  You can wing it, go in and buy whatever you like, based on whatever your gut tells you to do–and still do well, sometimes just as well as seasoned professionals. This facet of the activity makes it uniquely enjoyable and entertaining, in contrast with activities where success requires tedious adherence to externally-imposed rules and constraints.

A final important observation that Skinner made, this one only relevant to humans and certain higher mammals, pertains to language and thought.  Skinner viewed language and thought as behaviors that are formed, in part, through conditioning–both Pavlovian and Operant.  From a Pavlovian perspective, linguistic connections between words and meanings are formed through exposure to repeated associations.  From an operant perspective, what we think and say is followed by consequences.  Those consequences condition what we think and say going forward.

How does an infant baby connect the oral sound “Daddy” to the man who just walked into the room?  He connects the two because Mommy says those words whenever Daddy walks in.  How does he learn to say them himself?  The answer, at least in part, is through a process of reinforcement.  Whenever he says “Da da” in response to seeing Daddy, everyone in the room turns their attention to him and expresses endearment and approval–“Oh, that’s so cute, Billy!  Say it again… say it again for Daddy!”  Though barely out of the womb, the organism is already able to “select” beneficial behavior from among different possibilities so as to perform it more frequently.

We might think that the influence of operant conditioning on our thinking and our speaking–or, to use Skinner’s preferred term for these activities, on our “verbal behavior”–ends in youth. It most certainly does not.  The influence continues throughout our lives, shaping us in subtle ways that we often fail to notice.  The emotional reactions that occur inside us, when we think and say things, and that occur outside us, in the form of the approval and disapproval of other members of our verbal communities, have a strong influence on where our thought processes and the statements that express them end up going.

Unfortunately, the internal and external contingencies that shape how we think and speak often aren’t truth-oriented.  They’re often oriented towards other values–the building and maintaining of positive relationships, the securing of desired resources, the demonstration of status, the achievement of resolution, and so on.  In many contexts, this lack of truth-orientation isn’t a problem, because there aren’t actual tangible harms associated with thinking and saying things that aren’t true.  “Do I look fat in this dress?” — “No, you look great honey, <cough>, <cough>.”  The world obviously isn’t going to end if a husband says that to his wife.

But in the arena of finance–at least the part of the arena behind the curtain, where actual financial decisions are made–the fact that our thinking and speaking can be shaped by factors unrelated to truth, or worse, factors opposed to truth, is a huge problem.  It’s a huge problem because there are actual, tangible consequences to being wrong.

Given that problem, we need to be vigilant about truth when making investment decisions. We need to routinely check to ensure that we’re thinking what we’re thinking, and saying what we’re saying, because we genuinely believe it to be true, or likely to be true, not because we’ve been conditioned to think it or say it by the effects of various hidden reinforcers.  We want our thoughts and statements to represent an honest description of reality, as we see it, and not devolve into ulterior mechanisms through which we to try to look and sound a part, or earn status and credibility, or win approval and admiration, or acquire power in organization, or make peace out of conflict, or secure the satisfaction of “being right”, or crush enemies and opponents, or smooth over past mistakes, or relish in the pride of having discovered something important, or preserve a sacred idea or worldview, and so on.  These hidden contingencies, to the extent that they are allowed to creep into the financial decision-making process and shape our verbal behaviors, can be costly.

This entry was posted in Uncategorized. Bookmark the permalink.