Best Practices for Fermentation Data Analysis

Best Practices for Fermentation Data Analysis


  • Overview, challenges and solutions for bioprocess data management
  • Solutions for bioprocess data visualization & statistical analysis
  • Use of soft sensors in bioprocess analysis workflows

As a fermentation scientist or process development manager, you are responsible for the efficient development of high performing bioprocesses, as well as robust and predictable scale-up for future manufacturing in large scale. Consistent data management and data analysis play a critical role for the success of your development and manufacturing goals. If you  are working in the field of bioprocess development, scale-up process validation or manufacturing excellence, you might frequently ask yourself: How do I get a complete and reliable overview of the process data with less effort? How to structure the ample amounts of data from different sensors in a single database? How to aggregate data from small and large scale to perform scale down model qualifications? Which methods shall I use for data analytics and the statistical evaluation of bioprocesses? In this article we will give a brief overview on bioprocess data management, bioprocess data visualization, and statistics to evaluate bioprocesses.

Data Management

If you are involved with fermentation processes, you deal with large amounts of sensor data (e.g. pH, temperature and dissolved oxygen measurements), product quality data (e.g. product concentrations, specific activity, relative potencies), “non-numerical” data such as pictures (e.g. scanned SDS-PAGE data) and many more. For every analysis purpose, you need to reorganize the data in a time consuming process manually from different data sources. However the expectation is, that a fermentation scientist should search through batches quickly to screen for fermenter-type, media name, product or project type. Best practice database requirements are i) a suitable database model to store all bioprocess relevant data in one common database, ii) the possibility to assign phase information to time-series data and iii) database filters to identify batches you want to analyze quickly. Commonly used database filters in bioprocesses are:

  • Media Type
  • Process Type
  • Strain
  • Project
  • Customer
  • Product
  • Development stage
  • Start Date/ End Date
  • Operator/ User

Visualize Data

“You can see a lot just by looking”  – Yogi Berra. Yogi Berra and more recently Nathan McNight from Genentech nailed it: For bioprocess scientists, visualization is the most important tool to detect processing trends. See his great presentation at the CMC Strategy Forum which is available online here. 
To detect trends fast and report the results, you want to create your scientific visualizations directly from a common database, without the need for exporting the data or manually manipulating data in spreadsheets – which is prone to handling errors and not recommended. Visualizations, like multi-axis overlay plots to analyze processing trends, and boxplots/ histograms to relate quality and product attributes, are excellent for viewing your data. Commonly used visualization tools for bioprocesses are:

  • Multi-axis overlay plots
  • Bar-Graphs
  • Histograms

Statistical Data Analysis

Creating nice visualizations is a good way to draw conclusions from fermentation data. However, when communicating to management or to regulatory authorities there is the requirement to statistically verify the visualized trends. The communication of process development results to regulatory authorities (process validation stage 1) has recently become very statistics-driven . This is the point where you want to have realiable and simple, but also powerful statistics, based on a common database in Exputec’s inCyght. Below you can find a brief excerpt of statistical tools that are frequently used within the bioprocess lifecycle. See also an example below on how statistical equivalence testing can be realized.

Commonly used statistical tools

  • Statistical equivalence testing: Is the large scale process similar to my small scale process?
  • Statistical Power analysis: How many experiments do I need for my experimental study?
  • Statistical tests/ regression modelling: Do I see a significant impact of one factor on my fermentation performance?

Soft Sensors for Information Mining

Soft sensors make the most out of your data and signals that you have already collected in your bioprocess. Instead of considering to buy more and more hardware, you should decide to use as much as possible soft sensors connected to your existing sensors your bioreactor. Learn more about the use of soft sensors here.


How to realize data management, visualizations and must-have statistics, is one of the most important question a fermentation scientist or process development manager has to answer. One possibility is to use spreadsheets. Although this might sound like a quick fix, you will run into serious difficulties on the short and long run. You will miss the possibility for intelligent ways to search for batches and filter for the data you really need. The calculations are time consuming  and  rarely reproducible by another scientist or for management review. You will spend a lot of time trying to standardize your Excel sheets using formulas and macros, and at the end it is very likely that they are inconsistent. On the long run, you will lose a lot of time copying and pasting data from spreadsheet to spreadsheet.

A second possibility is to set-up a fermentation database for your company and connect it with statistical and visualization software. Various data sources with different data models and data formats have to be connected the database. The data model will be very complex, as you have to ensure to store additional information such as  process phases, etc… You must maintain the software and you will have a costly patchwork of many different software components without the possibility of direct database integration.

To increase the efficiency in bioprocess data management and analytics we have created inCyght, an out-of-the box solution for bioprocess data management, visualization and statistical analysis. inCyght is used to automate your data management, visualization and statistical analysis for the bioprocess lifecycle for upstream, downstream and quality data.

inCyght bioprocess data management and analysis platform