Density Preserving Sampling (DPS) - deterministic crossvalidation

Versión 1.1.0.0 (2,62 KB) por Marcin
MATLAB implementation of the DPS method, able to save computations when compared to cross-validation
616 Descargas
Actualizado 11 dic 2012

Ver licencia

The DPS method aims at producing representative splits of data in terms of PDF similarity. DPS is deterministic, so the resultant split is always the same for the same input. The method can serve as a more computationally efficient alternative to CV-based performance estimation. For the result of CV estimation to be more reliable, the whole CV procedure should be repeated multiple times. For example rather than performing a single run of 8-fold CV, which can produce a very unreliable estimate (see the reference below), a number of runs should be performed (typically 10). This means that to obtain the performance estimate, you need to train and test your model 10x8=80 times! However, using DPS you would only need to this 8 times. Assuming quadratic computational complexity of a typical learning algorithm, this can result in quite considerable savings of computational time. This is particularly useful when the performance estimation procedure (10x8 models in the above example) needs to be repeated multiple times in the course of parameter optimization (e.g. selecting optimal order of a polynomial, selecting the number of principal components to use etc.).

The attached figure depicts decision boundaries of a simple parametric classifier trained using a single fold obtained using 8-fold CV (red) and 8-fold DPS (black), superimposed on a scatter plot of the cone-torus dataset. Note how stable the decision boundaries are in the case of DPS and how they differ between various folds in case of CV.

The method has been extensively tested using datasets from the UCI Machine Learning repository, it has also been a part of a recent ISMIS 2011 competition winning solution and has been included in the current release of the PRTools toolbox. I would however welcome any feedback on the performance of DPS when applied to other problems you might be working on.

More information on DPS can be found in the following paper:
Budka, M. and Gabrys, B., 2012.
Density Preserving Sampling: Robust and Efficient Alternative to Cross-validation for Error Estimation.
IEEE Transactions on Neural Networks and Learning Systems, DOI: 10.1109/TNNLS.2012.2222925.

Citar como

Marcin (2025). Density Preserving Sampling (DPS) - deterministic crossvalidation (https://la.mathworks.com/matlabcentral/fileexchange/39390-density-preserving-sampling-dps-deterministic-crossvalidation), MATLAB Central File Exchange. Recuperado .

Compatibilidad con la versión de MATLAB
Se creó con R2007b
Compatible con cualquier versión
Compatibilidad con las plataformas
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Versión Publicado Notas de la versión
1.1.0.0

Cleaned up the code a little bit.

1.0.0.0