Bayesian Bioprocess Data Analytics in R&D and GMP manufacturing

Bayesian Bioprocess Data Analytics in R&D and GMP manufacturing

Everywhere you turn, someone talks about data analytics. Biotech companies, CMOs, big-pharma seek ways to gain competitive advantages through data analytics. Furthermore, regulators call for the use of advanced data analytics and data integrity for the full bioprocess lifecycle. Applying the right data analytics in bioprocess development and GMP manufacturing is challenging and very different from business analytics or statistics in clinical trials. High analytical method errors. Low number of runs / observations. How to deal with that?

Bayesian bioprocess data analytics offers a solution that shall not be overlooked. Bayesian is a different way to think about and model data. In this article, we delve into the basics of this exciting technology and explore applications in bioprocess development and manufacturing.

Why Bayesian in Bioprocess R&D and Manufacturing?

Why is Bayesian data analytics great? In one word: Believe. Bayesian methods allow you to leverage your “believe” also called “prior knowledge” for data analytics. The rules of probability (Bayes’ theorem) tell you how to revise your belief, given the observed data, and finally come up with an updated knowledge, called the “posterior”. Especially in areas where you have a lot of prior knowledge (bioprocess development, bioprocess manufacturing), this leads to better data analytics results.

Bayesian Scale Down Model Qualification

Bioprocess scale down model qualification is a routine task for bioprocess R&D & MS&T. Developing the bioprocess scale down model is the core of your development and validation activities. More on (qualified) scale down models here.

In scale down model qualification, you usually only have a limited amount of experimental data to demonstrate equivalence between small and large scale. Frequently, the amount of data (especially number of large or small scale runs) and the analytical precision is not sufficient to achieve narrow confidence intervals for scale down model qualification.

Using Bayesian methods, a two sample t-test can be implemented using prior knowledge about the mean and variance of the two groups. In theory it is also possible to use mean and variance priors about the difference in means to get a better estimate of the posterior.

In practice, you also deal with following question: Where to get the priors for scale down model qualification from? You can obtain your priors from two sources: 1) In case the process is a platform process, mean and variance of the large scale performance can be used from historical data. 2) In case one has already performed several scale down model qualifications, difference in mean and variance of process related impurities can be used.

Bayesian Bioprocess Design of Experiments (DoE)

Bioprocess Design of Experiments or “DoE” is a core technology for the organization of bioprocess development. Arguably, DoE is one of the most valuable tools we have in process development and for manufacturing problem solving. More on bioprocess DoE you can also find here.

The ultimate goal in DoE is to get more and more efficient with every experiment you do for your bioprocessing platform. But how to realize that? Incorporating prior knowledge is a valuable solution. This can be done with Bayesian methods.

Bayesian design makes use of existing data such as development data or historical DoEs in order to choose the “optimal” design. As a result, you can extract the same expected information gain with fewer experiments using Bayesian optimal design.

Bayesian Estimation of the number of process performance qualification (PPQ) runs

One of the most discussed applications of Bayesian methods is the estimation of how many qualification runs you need for your process performance qualification. This is a hot topic, since every GMP PPQ runs can cost in the range of millions and cost saving potential is huge. So, let’s first dive into why manufacturers have to do PPQ.

A manufacturer must successfully complete PPQ before commencing commercial distribution of the drug product [FDA Process Validation Guideline, 2011]

Moreover: The approach to PPQ should be based on sound science and the manufacturer’s overall level of product and process understanding and demonstrable control. The cumulative data from all relevant studies (e.g., designed experiments; laboratory, pilot, and commercial batches) should be used to establish the manufacturing conditions in the PPQ. [FDA Process Validation Guideline, 2011]

How do conventional methods to Calculate PPQ number work?

Today, industry is mainly setting the number of approaches using following rationales

  • Experience
  • Process capability calculations
  • Expected coverage calcualtions

ICH Q8, Q9, Q10, and process validation guidelines provide an opportunity to use acquired knowledge to strengthen the product/process understanding. However, all the conventional approaches lack in the mathematical facilitation of priori knowledge acquired bioprocess R&D and prior stages of process validation. Bayesian data analytics offers a solution for that.

Bayesian & Digital twins to reduce the number of PPQ batches

Bayesian methods in combination with digital bioprocess twins for PPQ number estimation for normal distributed critical quality attributes, e.g. by taking the mean and variance prior from a digital twin derived from R&D data and use this to derive the posterior distribution.

Therefore, a Bayesian approach in combination with digital twins can greatly improve the knowledge about future process distribution and thereby reduce the number of required PPQ batches.

Bayesian Continued Process Verification (CPV)

How can we know that our process continues to maintain its original validated state, even years after the launch? The FDA’s modern approach to this is Continued Process Verification (CPV), Stage 3 of the process validation methodology. More information on CPV here.

In CPV, one of the main tasks is to define control limits: this requires a large number of batches (typically 20-25) to obtain reasonable ranges. Using Bayesian statistics in combination with a bioprocess digital twin technology, you can estimate control limits from a reduced number of runs. Thereby even a limited number of manufacturing runs is sufficient to derive a solid estimation of control limits.