How We Got Into Process Data Quality

After this LinkedIn post, we got a lot of people asking what we are up to at Waves. For those snoopy peers, and all others, this post outlines why we are stressing about process data quality so much.

But First, Why Process Mining?

At Waves, we use data to improve processes with our customers. We see that a thorough understanding of the process and its data, is key to create business value from process data. Process mining helps to get this process understanding. Also, it supports communicating these findings to stakeholders. A well-designed image can get a point across much better than just words; the way process mining techniques structure process data into a neat process model is very appealing and easy-to-understand, also for non data-savvy people. Lastly, process mining is technically challenging; it combines data transformation, data visualization and data analysis. Three things we like.

Sounds great! Where's the catch?

Before we can get into analysis, we need to find the right data, extract it from source systems, transform it into a format that’s understandable for process mining analysis, and then clean it. If that sounds challenging, you’re right. Here are a few challenges we see:

Data transformation requires knowledge of the system back-end and the domain.
In most systems, the data required for process mining is scattered over different tables, and there is usually no description of how high-quality process data can be extracted. That means that the tables need to be transformed manually into the suitable format. In these steps, a lot of important decisions need to be made on what to include and how it is included, which requires a lot of knowledge of both the system and the process.
There is a disconnect between data transformation and data analysis.
Since the data transformation requires a lot of knowledge of the system, the people that transform the data are usually not the same as those who analyze it. The analyst is often not aware of all the decisions made in order to get the data.
Little practical guidelines exist for assessing process data quality in a structured way.
Hence, an analyst who starts working with the data, needs to assess the quality based on personal experience and in a trial-and-error fashion.
Trial-and-error data quality assessment is tedious and overwhelming.
This results in people getting stuck in the process, or data quality issues remaining unseen.

And here’s the catch: while proper data transformation and cleaning is difficult and time-consuming, it is key for a successful process mining endeavor. Without accurate and reliable data, the results of the analysis can be misleading or even flat out wrong.

What's next?

We are working on a systematic approach to assess the process data quality. This will be wrapped up in a publicly available, independent tool that helps users to detect data quality issues for process mining, and highlight the potential impacts of these issues on the process analysis.

How can you help?

We are in eager search of people that recognize this problem. If you recognize this process and you are willing to share your thoughts, please reach out. If you know people who might recognize the process, we would be very grateful if you could drop our name. And if you are big in generating clout within the process mining community (Wil van der Aalst, are you reading this?) or know how to get in touch with the right people, we would like to get to know you better. All recommendations are more than welcome.