Recent Data Science research from MIT shows that fake news travels faster than real news, which gets me thinking about similar problems occurring within an organisation in regards to the spread of erroneous data. The research carried out by MIT's Soroush Vosoughi involved an analysis of over 12 years worth of data, and found that true stories took 6 times longer to reach an audience of 1500 through social media than fake stories, and that the true stories rarely got shared beyond a thousand people, whereas their equivalent fake news could easily be spread to hundreds of thousands. Why was that? Why was the fake stuff so much more interesting?

When thinking about the equivalent in data terms, I am mindful of the phenomena of anchoring bias, which often cited as a strong argument for the use of machine learning algorithms as a complement to human based judgement. Once an anchor is set, other judgements are made by adjusting away from that anchor, and there is a clear bias for interpreting other information around the anchor. Research has found that once the wrong data has started to circulate it is very hard to persuade your audience to adjust their views towards the truth. This deep seated emotional response which stems from a reaction to the first set of data received is very hard to undo. In a recent conversation with an underwriter I listened to complaints that he couldn't understand why his competitors were making such good returns on certain lines of business that he had scaled back following a series of losses through claims. The pulling back was, as the logic went, because an increase in claims equals a drop in profits, and therefore the natural reaction is to increase premiums across that line. Seen through the highly limited data points of the claims analysis available, his judgement seemed correct, and to challenge that viewpoint would have been difficult.

The alternative argument is however, that if he had had access to a greater number and diversity of data points, perhaps involving weather, social conditions, etc… and this data set might be far beyond the capability of what one individual could easily ingest and analyse, then it follows that a predictive analytics model may be capable of giving a far more detailed and holistic view of the problem as a whole, thereby helping the underwriter make a more nuanced and ultimately better response. The vital factor is that this analysis of all relevant data needs to take place in the exact moment, as otherwise the anchor bias can make it very hard if not impossible to see any other viewpoint. Studies show that individuals tend naturally to hold on to a few facts in order to make a decision, and if these few facts are wrong in any way, then the consequences can be disastrous. The idea is that by combining our occasionally flawed human tendencies with a useful form of algorithmic data output means we are more likely to reach a rational, informed and less impulsive decision. Getting the data right in the first place is essential, and obviously key however you look at it. 

Fake data in organisations is epidemic. I have regularly worked with clients on an apparently intractable issue, only to discover that the reports upon which they are basing their decisions are and have been consistently wrong for months or even many years. That is a hard story for any organisation to hear, but is one frequently told. You however may not ever get to hear about it, and all due to another very common modern phenomena - that of burying bad news!