Data management and data analytics for the process industry

Data management and data analytics for the process industry


  • Role of data management and data analytics for the process industry
  • Data management roadmap
  • Data analytics roadmap
  • System validation: Part 11 compliance
  • Case study: Data management and data analytics to bridge the gap between process development and manufacturing

Data management/data analytics strategy for the process industry

During the past years lab digitalization has increased and sensor technology has improved tremendously. The management and analysis of accumulated data from R&D, Manufacturing and Quality Control (QC) has an immense potential to speed up time-to-market and optimize your manufacturing processes in terms of quality and economics.

However, only well prepared and analyzed data leads to process knowledge and, finally, to process control and continuous improvement. Thus, a robust and efficient data management/data analytics strategy is one of the most valuable concepts for the process industry. The data analytics strategy defines how to:

  • Conduct data management for process development, manufacturing and
  • Efficiently analyze the relation between process parameters and product quality and key performance indicators.
  • Develop platform knowledge by data analytics and mathematical modeling to continuously improve manufacturing platforms

Data Management Roadmap

Step 1: Stakeholders and User Requirements Specifications

Poorly handled evaluation efforts create serious issues during every step of your implementation. Proper conceptual preparation is key since ill-defined requirements may lead to delayed completion. Get input from all stakeholders: from IT, process development, manufacturing science, to quality and operations. You should carefully check all legacy systems to be integrated and which data analytics functionality is required. Systems interfacing to bioprocess data management and analytics platforms should also be examined. These systems include:

  • Process Control Software
  • Laboratory Information Management Systems (LIMS)
  • Electronic Lab Notebooks
  • Historians

The right implementation strategy will ensure that you are better prepared by being able to maintain focus on the original scope of your project and guarantees that your employees are properly trained and prepared. Designing a user requirement specification not only involves mining of relevant data streams, but also classification in relevant- and non relevant data streams in respect to your company’s KPIs. But don’t forget that data management is not an end in itself: Hence carefully evaluate your data analytics requirements that you envision to reach your goals.

Step 2: Executing Implementation

Implementation can be executed as single step implementation or step-by-step implementation. The single-step Software Implementation can be ideal for smaller operations and businesses with up to 100 users. All users are required to migrate to the new system at once. This allows you to focus on your project scope and implementation parameters by offering a simple and straightforward way of handling your processes.

Migrating to a new system step-by-step allows you to implement certain key features earlier while ensuring that possible complications are isolated from working processes. While this approach is more flexible than the single step implementation, it may take longer. You can find more on data management and analytics implementation strategies here. 

Step 3: Data Migration

The proper handling of data migration is another important aspect to consider. Evaluate data migration from all sources (legacy databases, spreadsheets, paper based recordings, etc.). For paper based documents evaluate technologies such as optical character recognition (OCR) and document crawling. Migrating all available data into the new system leads to immediate user acceptance after successful data management and ensures working efficacy.

Data Analytics Roadmap

The main workflow for  process data analytics can be summarized in the following steps:

Data alignment
make all relevant data sources available for data analytics
Data mining
mine relevant information to achieve your specific analytics goals
Data consistency testing
test for consistency in the data set
Identify a hypothesis
identify correlations/models using univariate- and multivariate statistics and build a hypothesis
Implement change or design an experiment
dare to proof your hypothesis by implementing a change or design an experiment to proof your hypothesis

Software system validation: Part 11 Compliance

21 CFR Part 11 is the guidline, of the U.S. Food and Drug Administration (FDA), regarding Electronic Records, Electronic Signatures and Electronic Copies of Electronic Records. The scope of Part 11 is limited to records which are maintained in an electronic format instead of a paper format.

Part 11 is necessary if you would like to use data management and data analytics software in a GMP environment, e.g. a (bio-) pharmanceutical Quality department and especially for drug discovery.

Below you will find the 7 main receivables for an FDA Part 11 audit

1. Validation

All GMP relevant computerized systems have to be validated, to ensure system accurancy, reliablility, integrity, availability and authenticity of required records and signatures.

2. Audit Trail

All electronic record changes, adaptions, events or modifications have to be monitored.

3. Access Protetion

System access must be controlled. It is mandatory to define the kind system (open or closed system).

4. Copies of Records

The system provide a method to copy the audit trailin an human readable format.

5. Record Retention

Electronical records should be stored, protected, archieved and provided by the system

6. Electronic Signature

The system has to provide the possibility to perform electronic signing just by trained and authorized users

7. FDA Voucher

A written voucher has to be provided to the reginal FDA-office, to ensure the validity of the electronic signature

Case study

EXPUTEC and Intravacc are implementing a comprehensive platform for data management, (real-time) data analytics, multivariate statistics and mechanistic modelling. The objective of the project is to help Intravacc to optimize product quality, increase productivity and ensure quality with robust and compliant bioprocesses. The Exputec inCyght software is being designed to automate workflows at the core and is being connected to almost anything at the edge of the network. The project manager says:

“We found an excellent realization partner in Exputec. The project team is very focused on our business needs and all interfaces are running smoothly and reliably. The collaboration is excellent.”

Exemplary EXPUTEC inCyght Architecture as executed in our projects

Data Governance

All data from upstream and downstream processes, bioreactors, spread sheets and analytical equipment is stored in a central database. In Exputec inCyght software, there is the possibility to manage all relevant bioprocess data like time series, quality measurements, spectra and microscopic pictures are managed in a single database. A consistent nomenclature of the variables is crucial for the process management and data analytics, and is defined within the project. An intelligent database search functionality for projects and products makes the work for scientists and operators more comfortable. A comprehensive user management and audit trails leads to full traceability data.


Data Analytics

To streamline the data analytics process, Intravacc implemented real-time data analytics. Real-time data is transferred to inCyght Predictive Analytics Workbench. The scientists run visual data analytics on real-time data within the web-browser. All predictive algorithms are implemented using inCyght Python and R integrations. There is no need to import or export any data. All data is available to perform any mathematical operation, visualization or any other analytical methods.

Figure 1: Data analytics using the browser based InCyght platform. Example of equivalence testing which is used to evaluate biopharma tech-transfers and scale-ups of upstream and downstream processes.

Points to consider

  • understanding requirements from all stakeholders is the key to success of data management and data analytics in the process industry
  • data management and data analytics cannot be designed and executed independently from each other
  • data migration from legacy data sources / databases contribute to the success of data management and data analytics and shall not be overlooked
  • your quality department is a stakeholder: carefully consider your requirements as regards to compliance from the start

About Exputec

Exputec is a technology-driven software and consulting company, delivering world-class services and solutions for biotechs and chemical process industry. Exputec uniquely combines software, data science and engineering competencies to solve customers’ challenges. Scientists and engineers use the inCyght® software for data management and data analytics. The inCyght® bioprocess platform enables the streamlined realization of process characterization, process validation and technology transfer projects.