How to implement kNN imputation on test set without data leakage?

4 visualizaciones (últimos 30 días)
Sebastian Weber
Sebastian Weber el 23 de Jul. de 2022
Comentada: Zexi Yang el 17 de Ag. de 2022
I am using knnimpute to handle missing data for machine learning. My data is subdivided into a training and test set (mTrain and mTest). The usage of knnimpute for the training set is easy. For the test set, however, I need the algorithm to impute missing values by using the nearest neighbor from the training set to prevent data leakage. Now I am wondering how to implement knnimpute on the test set in this way. Does anybody have an idea how to code that?
  1 comentario
Zexi Yang
Zexi Yang el 17 de Ag. de 2022
Why do you have to impute test set using nearest neighbor from training set? You can just use nearest neighours from test set without having any data leakage. Data leakage is where you impute training set using data from test set.

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Más información sobre Hypothesis Tests en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by