Using multiple datasets to fit parameters simultaneously in SimBiology

7 visualizaciones (últimos 30 días)
I want to fit a PK model with multiple datasets; every dataset has concentration time courses for different species in the model - how do I do this? The time points in each dataset are not consistent, if that matters. I'm using MATLAB R2024b and the SimBiology Model Analyzer app.
My model has multiple compartments, and Compartment1 has two species called "RNA" and "PROTEIN".
The datasets look something like this:
Dataset1 which corresponds to RNA values in the plasma:
Dataset2 which corresponds to protein levels in the plasma:
I want to fit the model parameters to both the datasets, where I'm mapping PLASMA_RNA from dataset1 to 'RNA" and SERUM_PROTEIN from dataset2 to "PROTEIN".
  3 comentarios
Arthur Goldsipe
Arthur Goldsipe el 15 de Mzo. de 2025
Are the model's initial conditions the same for both experiments? In other words, once you fit your model, would you need to do a single simulation or two separate simulations to predict these two concentrations?
Mukti
Mukti el 17 de Mzo. de 2025
The initial conditions are the same for both experiments - I would just do one single simulation to predict these two concentrations.

Iniciar sesión para comentar.

Respuesta aceptada

Arthur Goldsipe
Arthur Goldsipe el 15 de Mzo. de 2025
Editada: Arthur Goldsipe el 17 de Mzo. de 2025
You first need to decide whether these two concentration profiles should be treated as part of the same experiment/simulation.
If so, then you need to merge them into a single time course, using NaN to indicate missing measurements (presumably the same way you're using . at time 0). If you want to do that programmatically, you can use MATLAB's join operations. Here's what the merged data might look like using the first 4 rows of your datasets:
rna = table([0;0.08;0.24;0.49], [nan;17.11;8.22;18.6], VariableNames=["Time", "Plasma_RNA"] );
protein = table([0;0.24;1.91;3.1], [nan;10;97.1;90.1], VariableNames=["Time", "Serium_protein"]);
joinedData = outerjoin(rna,protein,Keys="Time",MergeKeys=true)
joinedData = 6x3 table
Time Plasma_RNA Serium_protein ____ __________ ______________ 0 NaN NaN 0.08 17.11 NaN 0.24 8.22 10 0.49 18.6 NaN 1.91 NaN 97.1 3.1 NaN 90.1
If they're different experiments, you will just need to stack them and add a grouping variable to indicate which measurment belongs to which experiment. Here's what that would look like using the first 4 rows of your datasets:
rna_id = [table(repmat(1,height(rna), 1), VariableNames="ID"), rna ];
protein_id = [table(repmat(2,height(protein),1), VariableNames="ID"), protein];
stackedData = outerjoin(rna_id,protein_id,Keys=["ID","Time"],MergeKeys=true)
stackedData = 8x4 table
ID Time Plasma_RNA Serium_protein __ ____ __________ ______________ 1 0 NaN NaN 1 0.08 17.11 NaN 1 0.24 8.22 NaN 1 0.49 18.6 NaN 2 0 NaN NaN 2 0.24 NaN 10 2 1.91 NaN 97.1 2 3.1 NaN 90.1
Once you have the data in one of these forms, you can perform the fit in SimBiology using sbiofit or the Model Analyzer app.

Más respuestas (2)

Arthur Goldsipe
Arthur Goldsipe el 14 de Mzo. de 2025
SimBiology users typically do this by merging the multiple datasets into a single dataset and fitting them constructing an apprporiate fit problem.
If you need more guidance on that, take a look at previous similar questions:
If you still have remaining questions, I suggest you create a new MATLAB Answers question that provides more details. Ideally, if you could share sample code (data and model) that illustrate your situation. Also please clarify what version of MATLAB you're using and whether you are working in the SimBiology Model Analyzer app or writing your own MATLAB code.

Image Analyst
Image Analyst el 15 de Mzo. de 2025
Maybe I'm misunderstanding what you want to do, but why don't you combine both time vectors into a single time vector which you use to interpolate the missing times in each set using something like interp1. Then you will have values of serum and plasma at the same/common time points. Then if you want to do "mapping PLASMA_RNA from dataset1 to 'RNA" and SERUM_PROTEIN from dataset2 to "PROTEIN".' you can use polyfit or fitnlm or some other fitting algorithm (see the Regression Learner app on the Apps tab of the tool ribbon) to make a transform/model relating serum to plasma.
  1 comentario
Arthur Goldsipe
Arthur Goldsipe el 15 de Mzo. de 2025
SimBiology doesn't require measurements at the same times for all responses/species. You can just put NaN (not-a-number) in any place where you don't have a measurement.
Alternatively, SimBiology allows you to treat them as two separate time courses (requiring two different model simulations, with potentially different intial conditions or dosing). If they are different conditions, the two time courses just need to be "stacked" on top of each other, and another variable needs to be added to the data to indicate each time course. (I'll add a more complete answer for this shortly.)
Moreover, I strongly discourage interpolating values for at least two reasons:
First, interpolating could result in values that are not consistent with the underlying biology. Biological measurements are often quite noisy and highly nonlinear. So standard inpolation techniques are quite risky.
Second, adding interpolated "measurements" can bias the fitting and provide incorrect statistics in the results. For example, many statistical calculations require the degrees of freedom (dfe), which is the number of observations minuts the number of estimated parameters. Artificially inflating the number of observations will change the dfe, potentially leading to very differ parameter estimates, standard errors, and so forth.

Iniciar sesión para comentar.

Comunidades de usuarios

Más respuestas en  SimBiology Community

Categorías

Más información sobre Scan Parameter Ranges en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by