Optimizing Interpretability in Gaussian Process Regression Models: A Strategic Approach to Preprocessing and Testing Data
Mostrar comentarios más antiguos
Hi
I am utilizing the Regression Learner App to develop a model that can adjust my RAW data so that it can accurately predict data accordingly. My question pertains more to the general usage of the tool.
1. When setting my input data, there is an option to reserve a portion of the data for testing. Does this process allocate the learning and testing data randomly, or does it do so sequentially, e.g., using the first few weeks of data for training and the remaining for testing?
2. I have discovered that Gaussian Process Regression (GPR) models yield the best results for my dataset. However, this type of model lacks interpretability. My inputs include Signal Data, Temperature, and Humidity.
If I wish to assess the individual impact of each input on the overall signal, in terms of applying a linear or polynomial correction before the GPR model processing, is this possible? By doing so, I can minimize the amount of data fed into the GPR model, which in turn might provide some interpretability for my overall modeling process.
Respuestas (1)
Drew
el 3 de Nov. de 2023
0 votos
- Regression Learner partitions the test data randomly. In Classification Learner, the partition is random and stratified. (https://www.mathworks.com/help/stats/cvpartition.html). Stratification is based on the class labels. That is, an attempt is made to keep the class frequency similar in the training and test sets. If you want to control your test partition, you could (1) first partition your data into train and test outside of the Learner app, (2) load the training data into the Learner app at the session start dialogue, and (3) later load the separate test data into the Learner app.
- You can use model-agnostic interpretability techniques such as Partial Dependence Plot (PDP), Shapley, and LIME on your GPR models. In R2023b, you can use these techniques inside the Learner app using the "Explain" tab within the Learner app.
If this answer helps you, please remember to accept the answer.
Example screenshot from the Regression Learner app, within the Explain tab, for a GPR model on fisheriris data:

4 comentarios
Dharmesh Joshi
el 4 de Nov. de 2023
Drew
el 4 de Nov. de 2023
The partial dependence plot that you provided indicates that the predicted response generally rises as the temperature rises. Is this consistent with your expectaions? Is this a sufficient model explanation for you, or are you looking for some other info from the model interpretability?
Can you provide more info about your overall goals for this analysis? It is not clear to me why you want to "adjust your RAW data" before passing it to the model. Perhaps this step is not needed. In general, if the model can produce a good regression result with the inputs as they are, perhaps that is simpler and more desirable. You mention two possible reasons: "By doing so, I can minimize the amount of data fed into the GPR model, which in turn might provide some interpretability for my overall modeling process."
(1) The first reason that you mention is to "minimize the amount of data fed into the GPR model". That sounds like you want to do dimensionality reduction, or feature selection. For that, you could use the PCA and/or feature selection capabilities within Classification Learner. If some of the current input predictors are not needed, those can be removed using feature selection, then you can rebuild, re-test, and re-interpret the model. If you choose to apply PCA, that is a way to reduce the dimensionality of the model (if you don't use all PCA components) in order to save computation, but it would generally make your model harder to interpret, because you would be feeding principal components into the model, rather than easily interpretable predictors like "Temperature". If you go down that road, perhaps you could use the model without PCA for model explainability (for those wanting to understand how the original predictors generally affect the model output), and use the model with PCA for efficiency (if you are reducing to fewer PCA components).
(2) The second reason that you mention is that a "linear or polynomial correction" might "provide some interpretability for my overall modeling process." I don't think this goal is needed, because you can already interpret the GPR model as it is.
For more background on using PDP within Classification Learner, here is a MathWorks video (this shows an earlier version of Classification Learner, before model explainability was on a separate tab): "Use Classification Learner App to Interpret Machine Learning Models with Partial Dependence Plots"
Dharmesh Joshi
el 4 de Nov. de 2023
Dharmesh Joshi
el 6 de Nov. de 2023
Categorías
Más información sobre Gaussian Process Regression en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
