On Process Mining and LeanXcale. Q&A with Alejandro Ramos Soto
“Adapting our data structures to LeanXcale helped us rethink and redesign our models, and moreover caused them to become more efficient.”
Q1. You are the Chief Scientific Officer at InVerbis Analytics. What is the main business of the company?
We are developing a cloud-based solution for business process analytics. This platform ingests process logs from companies, from pizza orders to car manufacturing, and runs a set of algorithms that provide insights into, and visualizations about, the actual execution of processes, as well as potential problems such as loops and bottlenecks. Due to the nature of our algorithms and the kind of information they provide, InVerbis can be considered a process mining solution. However, as a company we intend to extend beyond this field and provide a comprehensive solution that will enable the analysis of value streams in organizations, offering everything from data generation and capture tools to an extensive set of analytics and visualizations.
Q2. What are your current projects?
In addition to supervising the platform development and defining which features to implement, I am also undertaking consultancy and process mining projects. We use our own platform to perform the analysis for these projects, so I am provided with feedback from our clients, along with many ideas about features to add in future releases of the InVerbis platform. For instance, among others, we have already worked on medical (cardiology), car manufacturing, and electronic public service projects.
Q3. You are interested in fuzzy sets theory, natural language generation in its data-to-text specialization, and business process analytics. Do you use them in your current job? And if yes, how?
My academic and scientific background is in a very different area, although admittedly there are some similarities. Developing natural language generation systems, which produce texts from numeric data for human consumption, involves a prior analysis of the data to select what content to include in the texts and how to map numbers and symbols to actual words. This is similar to what InVerbis does when analyzing process data, with such analysis consisting of an abstraction mechanism to provide users with more intelligible information. Users can use this information to identify, and take appropriate action to solve, problems in their processes.
In the near future, we intend to integrate this kind of features to enrich and enhance our dashboards with written texts that provide more useful information to complement metrics and visualizations.
Q4. What is Process Mining and what is it useful for?
Process Mining is a discipline focused on the research and development of data models and algorithms that can analyze how processes perform. This includes algorithms for determining the actual flow structures and obtaining associated metrics such as frequencies and durations, but also methods to check how executions follow or deviate from an expected process model, known as conformance techniques. There exist even more ambitious proposals, such as those that intend to predict future outcomes of processes, or those that try to determine if and how the process structure changes over time.
Commercial process mining tools, such as InVerbis, adapt some of these algorithms (which were not conceived based on the needs of actual companies) in a format that is technologically prepared and intended to be used in real business organizations.
Q5. A number of things could go wrong, for example mistakes that are not foreseen, tasks that are repeated, feedback loops, suppliers that make mistakes, or the design being incorrect. How do you help here?
Since our algorithms recreate the actual execution flow of a process, our platform can provide companies with several insight elements that could potentially be damaging execution performances. Such insight elements include excessive durations in and between activities, extreme variations in durations that may suggest the existence of a lack of control, and sequences of activities that repeat themselves within the same execution more than once, for several different executions. Some of these troubling elements may be partially known to the company, but having the means to check and verify them objectively based on data brings process analysis and control to a new level.
Q6. How do you identify the origins and root causes of inefficiency in practice? How do you find bottlenecks and loops?
Bottlenecks and loops are automatically found by the tool based on the duration of activities and the flow structure of the process. However, it is up to the particular analyst involved, with their expert knowledge of the process, to decide whether these are acceptable behaviors or, on the contrary, whether they represent actual unexpected deviations from the ideal execution path that should not be occurring. Likewise, root causes can be identified by exploring subsets of the data based on the different filter types that we provide. We are also developing a new feature that will automatically suggest root causes based on attribute data linked to the process.
Q7. How do you make sure that when you reconstruct and visualize the actual execution, you do not introduce errors yourselves?
Our algorithms have been developed and tested for many development iterations, while every change or addition we include is verified using a battery of test logs. In these ways, we ensure that no errors make it to the production stage. There is, however, always the potential for mistakes and errors, whose origins tend to lie in the quality of the source data: missing values, swapped time fields, mistakes in the name of activities, or redundant registered events, for example. When the data has been processed, and there is still something that does not make sense, we turn our attention to the raw data to find the answer. As in many other fields of computer science and AI, the actual usefulness of algorithms relies on the quality of the data with which they are fed, regardless of how cleverly they were designed and implemented.
Q8. Among your partners you have LeanXcale. What do you use the LeanXcale database for?
We had observed that both the computational requirements of, and the amount of data used by, our customers were increasing. Query times, therefore, had grown accordingly. The user experience can be greatly affected by such an increase, resulting in our need for a database that we could rely on. LeanXcale is ready to store and effectively perform our queries on process data, allowing us to manage tens of millions of events and obtain information in (almost) real-time.
Q9. In particular, can you tell us about your experience and the lessons learned in using LeanXcale ingestion, scalability and SQL integration features? What are the main benefits for you in using LeanXcale?
Adapting our data structures to LeanXcale helped us rethink and redesign our models, and moreover caused them to become more efficient. Under this new design, and thanks to the outstanding performance that LeanXcale provides, we can now delegate part of the processing to the database, to such an extent that we are considering getting rid of our Spark component altogether. Thus far, we have been using this component to address the distributed execution of our algorithms. It comes as a great relief, furthermore, to know that we can safely cover cases with a sudden increase in data ingestion without needing to worry about the performance decreasing abruptly or having to increase the number of available cores and memory. Finally, since LeanXcale is tightly integrated with SQL, and of course depending on the needs of our clients, we can keep the high flexibility that characterizes our platform and allow other database solutions to be integrated with InVerbis.
Alejandro Ramos Soto is Co-Founder and the current Chief Scientific Officer of Inverbis Analytics. Formerly, he worked as a researcher at the Universida de Santiago de Compostela, first as a PhD student and later as a postdoctoral scholar. It was in this latter role that, together with other professors, he set up InVerbis as a spin-off company. He has done all sorts of things related to computer science, from working on lung cancer nodule detection algorithms to developing a textual weather forecast generation system. He is currently focused on improving the process mining features included in the InVerbis platform.
Sponsored by LeanXcale.