How good is Data Science without Domain Expertise?
Is it wise to analyse data or to train machine-learning models without the understanding, context and reality checking of domain expertise? And is it important to actually understand the relationships between the variables that are exposed through data analysis, in order to apply them?
I attended a webinar this week hosted by a vendor selling analytics software that is targeted at the data science user. They presented a specific use case of predictive maintenance that had me interested, hence my attendance. It had me wondering, as a founder of an engineering consulting company over a decade ago, whether just analysing lakes of data could actually be useful for predicting failure.
Since the late ‘80s I have been puzzled by the limited connection between engineers (the domain experts, in this case) and data scientists. Indeed the most innovative work that I saw was an engineer manning a BBC computer that generated a probabilistic analysis of the likelihood of failure of furnace tubes for a refinery. He used metrics such as temperature, tube wall thickness, pressure and other parameters to model this mechanistically. In other words he as an engineer understood the mechanics of failure before trying to put it all in a model to see if he could find the likelihood of failure.
It did seem to me that data scientists were approaching this from the point of view of masses of data that is analysed for correlations. Of course the hype is more about data than science (the domain expertise).
The traditional maintenance model is that of reactive maintenance, where you fix the break. We then moved to planned maintenance which attempts to manage the breakdown by maintaining at appropriate periods, perhaps drawing on history to help manage the maintenance intervals. One does not want to maintain too intensively, especially in a high value plant, as maintenance by definition means lack of production (at least at the deep maintenance end). Therefore the maintenance interval is managed to optimise expected failure and run time.
Predictive maintenance takes this idea one step further by trying to predict failure through modelling of the system (hence the data science approach above), and therefore trying to schedule maintenance through techniques such as real time monitoring of vibrations or temperature. The input from these measurements refines the model used in the predictive maintenance algorithm and with luck and design, optimises maintenance and production.
The use of Artificial Intelligence in all of this is where the real value lies. Supervised learning is about discovering the relationships towards a specific goal e.g. predicting when an element is likely to fail, whereas unsupervised learning can uncover relationships and categorisations that just exist – in both cases that the domain expert may not be aware of.
Generally it would be better if the domain experts can explain the relationships discovered from data science, as it can be used to expand their mathematical models, and builds their confidence in the outcomes. However the strength of machine learning specifically is that their algorithms can still exploit relationships that are too complex to be understood d by even the domain experts (given that there is a high level of correlation/confidence in the relationships).
I think that it is fair to say that the best situation is combining the power of data science with the understanding, context and reality checking of domain expertise. This usually requires close teamwork as it is rare to find individuals with both!
Post Script: What is really exciting for me in the InsurTech world is then applying the “risk of failure“ algorithm to the business and safety impact, such as is used in RIMAP (the FERMA risk certification programme) to derive real numbers which can then play a role in actual risk measured, albeit relative risk to begin with. I look forward to participating in the very exciting field that is developing here, an intersection between maintenance, risk and analytics.