I think I found a relevant MATLAB example (Train Network on Image and Feature Data) which could help me. The URL is here: https://www.mathworks.com/help/deeplearning/ug/train-network-on-image-and-feature-data.html
In the example, the training data are converted into datastore Type via arrayDatastore and then combined into dsTrain, as seen in the picture below
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/1260365/image.png)
Seems like the sequence of the combined data is the same as the input required by the neural net, as seen below
![](https://www.mathworks.com/matlabcentral/answers/uploaded_files/1260370/image.png)
dsTrain = combine(dsX1Train,dsX2Train,dsTTrain);
dsX1Train(ImageInput), dsX2Train(rotation angle), dsTTrain(output).
Am I correct?
However, an answer from an experienced user or Mathworker would help a lot, :D.