Can automation work without good data supporting it? The simple answer is very likely to be “no.” Naturally, the next question would be: “Why?”

To understand this, we must first consider the impacts that good—and bad—data can have on automation.

What is automation?

Automation can come in many forms, but essentially it is taking something that is run manually (by a person) and developing a machine or program to run that process automatically. This is quite a complex achievement when you consider all the potential variables that need to be “managed” by the automated process (AP). Designers of the AP need a very detailed understanding of the physical parameters, mechanical parameters and quality parameters to properly deliver automation.

Some aspects of automation are quite easy to envisage – like car production automation –  where we often see images and videos of cars on the production line being constructed automatically by an army of robot arms. Other areas, such as the monitoring of quality and outcomes, are not so readily seen, even though they are there in the background. The computer systems that power an AP are not just there to direct the robots – they are very often changing the way the AP runs – making subtle changes based on tolerance test outcomes.

When does data matter?

Analytical results and tolerance test outcomes are an area where data quality and management is critical. The AP will be required to deliver a product to a given specification, within certain tolerances. For example, in drug production, every pill has a concentration of drug product within 0.01% of target or every pill is within a certain range of size. These critical variables form the basis of success criteria and therefore product acceptance.

If the variables are not measured, stored and analysed correctly, then the AP will not deliver – meaning the product could have issues. Measuring variables is quite a simple process, but how accurate, precise and true the measures are, is very important. Each variable is slightly different, but you need to know these differences exist so that product quality can be assessed. And, since ‘trueness’ is a derivative of other measures, it must be calculated – and this is where the quality of the data is critical.

If the format and scale of the variable measured are not captured, you can expect complications. For example, if I collect data on a pill size, but I don’t note the scale, 5.567 could mean 5.567mm or cm or m. If the scale in this example is not captured correctly, it risks not being readable by a human or a computer.

This ambiguity introduces risk into the data process – you’re likely to be either guessing or estimating the meaning of something, not using its real meaning. This also introduces risk into your decision making processes, which could lead to the release of defective products. In pharmaceuticals, this could mean including the wrong concentration of an active ingredient in a drug product, which would have serious repercussions.

Every measure of a variable needs to have the value, known significant figures, scale, time and date of collection, in a computer readable format, as a bare minimum. This enables calculations to be conducted and the values obtained to be used for decision-making.

Without this minimal information, decisions made about the data might be incorrect and the decisions become even trickier to automate. The goal of an AP is that all aspects are automated—the elimination of human intervention. The systems need to be able to make their own decisions.

Take the example of the pill case. If a pill is too big, it gets removed from the process. Sometimes, this is as simple as letting the correct size pills fall through a hole which is too small for the larger pills. But in other processes, the analysis and decisions cannot be conducted using physical sorting. Here, the results of the variable test are critical and need to be captured, stored and time stamped as described above.

The format and context of results, including significant figures and units, is as critical as the data that is used in aggregate calculations to establish other parameters like trending mean, precision and accuracy. Without this information, calculations can, and do, go wrong.

For any automation to be successful, there needs to be high-quality data for it to run on. Without good quality data management this critical aspect can give rise to risk and errors in the process – precisely the element that the automation process is intended to remove or significantly reduce.

Bad data and poor data management rigour introduces unwanted risk in automation and should be avoided at all costs. Management of the process data underpins many aspects of quality and product-based decisions, so the importance and subtleties should be considered when designing new automation processes or updating the old. Some types of automation, like pill size, can exist without data centred decisions. But those that rely on other variables, such as those intrinsic to the product composition, must be managed with good data. Without it, automation will just speed up the production of an unwanted product – wasting time, money and resources.

Paul Denny-Gouldson heads the overall strategic planning for the various market verticals and scientific domains at IDBS.  He obtained his Ph.D. in Computational Biology from Essex University in 1996, and has authored more than 25 scientific papers and book chapters.