Impulsive Decisions

Who is not prone to this behaviour? A new information is received and too much weight is put on its importance. With just slight verification -rather impulsively- a decision is made based on the supposed new circumstances. Or take it the other way round. One sticks to the prevailing opinion much too long although new information has suggested for some time now to change the course.

And what is true for single pieces of information, is even more true for a cascade of incoming information revealing a certain pattern. As Nate Silver (see reference below) puts it, in a data-rich environment it is quite easy to find patterns. The crucial issue is, to determine whether those patterns detected represent a signal to be taken into account or simply represent just senseless noise.

In other words, uncertainty in decision making starts already with considering which information could be important and which information is an erratic glimpse of nothing. Nevertheless, there is a perfectly weird statistical concept helping to govern those kind of situations: Bayesian reasoning. This is a kind of thought process which can help to have an eye on the big picture and be less prone to heady decision making.

Skyscrapers

A tragic event like the 9/11-terror attacks on the World Trade Centre in New York can show how Bayesian reasoning evolves in the midst of chaos. Let's take the following set-up into account:

Before the 9/11-terror attacks, the assumed probability of such a terror event happening was significantly low, maybe 1/20,000 (x = 0.00005).

As soon as the first airplane hit one building of the World Trade Centre, one might assume with a certainty of 100% that it is an act of terror (y = 1).

And still - there could be the possibility of an aircraft hitting the tower by accident. The possibility of such an accident amounts to 0.008% (z = 0.00008) for the city of New York. This figure derives from the fact that in the 25,000 days before the 9/11-incident there were exactly 2 planes crashing into one of New York's skyscrapers.

Bayesian reasoning would take all those figures into account while trying to get a grip on the situation. With the first aircraft hitting one tower of the World Trade Centre, the probability of a terror attack is just 38.5% (see calculation below). This is surprising. Although impulsively thinking about a terror attack from the very beginning, the overall environment puts a rather conservative picture on this. At this stage, a terror attack is far from being a certainty.

```
x*y
Pr(attack|1.airpl.) = ------------------- =
x*y + z*(1-x)
0.00005 * 1
= --------------------------------- = 0.385
0.00005*1 + 0.00008 *0.99995
```

The overall picture changes relatively fast when the second airplane crashes into the other tower of the World Trade Centre. At this level, the basic chance of a terror attack is no more 0.005% as initially assumed. With the incident of the first aircraft, the basic probability is meanwhile at 38.5%. That means for the probability of having a terror attack given the second plane crashing into a skyscraper:

```
x*y Pr(attack|1.airpl.) = ------------------- =
x*y + z*(1-x)
0.385*1
= --------------------------- = 0.999
0.385*1 + 0.385*(1-0.385)
```

So with the second aircraft hitting the building, there is already a 99,9% certainty that a terror act is on the way. Hence, subsequent decision making is based on a complete different level of quality and far away from being impulsive.

Bayesian Reasoning

In principle, Bayesian Reasoning knows three stages:

the Prior: This is the basic assumption or current knowledge of a framework. Let's call it Pr(Theta).

the Likelihood: This is the observation of data, always being put in the context of basic assumptions/ basic knowledge. It is called Pr(Data|Theta), i.e. observed data given the basic assumptions.

the Posterior: This is the update of assumptions/ knowledge based on the data which was recently observed. This is symbolised by Pr(Theta|Data) and in an analytical approach can be derived as follows:

```
Pr(Data|Theta) * Pr(Theta)
Pr(Theta|Data) = -----------------------------------------------
Pr(Data|Theta)*Pr(Theta)+Pr(Data|nTheta)*Pr(nTheta)
```

Will Hipson (see reference below) gave a very illustrative example which I would like to share here:

There is a plot of land consisting of 10,000 parcels. The current expectation is that every parcel has a 30% chance of being filled with gold. In order to check this, a sample is taken with 100 parcels. Out of those 100 parcels, 14 parcels were filled with gold. Does the assumption of a 30% chance hold?

First of all, there is the question gold/ no-gold. Hence, we are operating in a binomial environment. In case of our sample this means

sample size n = 100

number of successes k = 14

and the likelihood faces the following probability density function (pdf)

```
likelihood <- dbinom(x=14,size=100,prob=seq(0.01,1,by=0.01))
```

The pdf for the prior is derived from the conjugate beta-function

```
prior <- dbeta(x=seq(0.01,1,by=0.01),shape1=3,shape2=7)
```

Apparently, the distribution of the posterior gives a clear answer (see the green line in the graph below). Based on the sample, the assumption of a 30% probability of finding gold in a parcel can no more be upheld. The chances are more in the 15% area.

Convergence - Curbing Uncertainty

In order to avoid impulsive/ emotional moves in a decision making process, any new information should not be taken at face value alone. Rather, any new information is to be put into the context of current knowledge. With every new information being incorporated in this way, the level of certainty for the decision maker rises.

It also can be seen the other way round. Assumptions about a current environment may be true but possibly they may also be wrong. It is not sure at this stage. But every incoming information is put in the context 'data observed given the current assumption'. If the observed data does not match the current assumptions, any update will converge those assumptions until observed data and assumptions do match again. This is another way of reducing uncertainty.

Here is an easy but very fine example of how the initial assumptions converge as new information is coming in and how the level of uncertainty is more and more curbed. Full credit for this example goes to Methods Bites: see reference below.

We have a coin which we assume to be a fair coin. So the probability of tossing head (or tail) is therefore 50% (the prior). In reality, the coin is flawed and favours tossing head in 80% of the cases. But this we do not know (yet). In order to check if our assumptions of having a fair coin are reasonable, the coin is flipped 1,000 times.

Here are the considerations for the prior distribution (it is again a head/ no-head resp. a head/tail or yes/no question, therefore we are in the binomial family):

```
prior:
len.pi <- 1001L # number of candidate values for pi
pi <- seq(0,1,length.out=len.pi) # candidate values for pi
a <- b <- 5 # the hyperparameters for Beta dist
prior <- dbeta(pi,a,b) # beta distribution
```

With this starting point, the (flawed) coin gets drawn 1,000 times.

```
likelihood:
n <- 1000
pi_true <- 0.8
data <- rbinom(n,1,pi_true) # binomial distribution
```

It is evident from the graph below that much more of those 1,000 tosses resulted in getting head (number 1 in the graph) than in getting tail (number 0 in the graph):

Putting every toss of the coin, i.e. incorporating any new information, into the context of the assumption of a fair coin:

```
posterior:
for(i in seq_len(n) {
current.sequence <- data[1:i] # ith toss of coin
k <- sum(current.sequence) # number of heads in seq
# updating the prior
a.prime <- a+k
b.prime <- b+i-k
# analytical means and credible intervals
posterior[1,i] <- a.prime/(a.prime+b.prime)
posterior[2,i] <- qbeta(0.025,a.prime,b.prime)
posterior[3,i] <- qbeta(0.975,a.prime,b.prime)
```

results in the following development:

It can be clearly seen that the assumption of a fair coin cannot be kept. The expected value converges very fast from the 50% area into the 80% area where the true value is located. It is astonishing how fast this convergence is established and, once moved, how stable it stays around 80%.

The second topic worth noticing is the reduction of uncertainty happening with every new flip of the coin. Here, this is represented by the 95% credible interval at each stage. The level of uncertainty at the starting point with the basic assumption of 50% (the red area in the graph above) is much bigger compared to the situation after incorporating every new information (the blue shaded area in the graph).

Conclusion and its Application in Real Estate

This, of course, was just a short introduction of the possibilities how Bayesian Reasoning can improve the quality of decision making by reducing uncertainty and helping the decision maker to suppress impulsive movements. Saying this, it is to be noted that the analytical approach as shown is only possible in unidimensional or low-dimensional feature spaces. As soon as input parameters get heavier, numerical approximations like Markov Chain Monte Carlo algorithms are to be implemented.

In the real estate business we see quite a lot of possible applications of Bayesian reasoning. Examining a new investment market, testing critical parameters for a successful portfolio development, verifying signals of a market downturn are just a few of those possible applications.

The overall goal is to make the life of decision makers much easier and to get a counterweight on an 'expert's guts feeling'. Needless to say, but DDARKS is already working on a model in the area of portfolio steering using RStan as its basic model software.

References

The Signal And The Noise - The Art And Science Of Prediction by Nate Silver, 2012

An Intuitive Look at Binomial Probability in a Bayesian Context by Will Hipson published in Rbloggers/ January 27, 2020

Applied Bayesian Statistics Using Stan and R by Methods Bites (Blog of the MZES Social Science Data Lab) published in Rbloggers/ January 29, 2020

## Comments