MATLAB Answers

Regression/ Ordinary Least squares on a custom equation

3 views (last 30 days)
Roop_T on 24 Jul 2021
Commented: Roop_T on 26 Jul 2021
I am trying to model the relationship between Load & variables say X and (T - 1,2,3,4,5,6) according to the following equation:
Load = [ alpha(X) + B1*T1 + B2*T3 + B3*T4 + B4*T4 + B5*T5 + B6*T6] for X = 1 to 672
1) I have Load in the form of 15 minute interval data for a few months
2) X is a variable that is defined like this based on time:
Monday 00.00 am to 00.15 am = 1
Monday 00.15 am to 00.30 am = 2
Sunday 11.45 pm to 00.00 am = 672
This repeats again from 1 to 672 for the next week and is not a running number
T1 T2 T3 T4 T5 T6 are temperatures at each 15 min interval
Additional Info :
I can feed L, X, and T1 to T6. How can i perform regression on my equation to get coefficients alpha and B1 to B6. Observe B1 to B6 do not change with X but alpha does. So my regression output needs to be a vector of coefficients for Alpha, one for each X from 1 to 672 and a single value for B1 B2 B3 B4 B5 & B6 since they dont chage with X. I tries various ways and looked online.. All of them only say how to do this
Load = Alpha*X + B1*T1 + B2*T3 + B3*T4 + B4*T4 + B5*T5 + B6*T6
I have attached a subset of the data - about 8 weeks
  • Ok ! Let me go in detail. I have several months of load data for a chiller at 15 minute intervals. The assumption is that chiller load not only depends on temperature but also on time of week.
  • For ex, Lets say on a Wednesday at 10.00 - 10.15 am there is generally less occupancy so chiller load might be less than some other day with similar Outside air temperature. So the chiller load dependency is not just purely Outside temperature but also time of week.
  • The temperature at each interval is broken down into 6 components to get a piecewise continuous linear equation. (not important). So thats the T1 to T6 you see.
  • Then to incorporate time of week, we break a week into 672 15 minute intervals. The first X=1 starting at Monday 00.00 am to 00.15 am and so on till X = 672.
  • So the chiller load equation is modelled as:
[ Load = Alpha(function of time of week variable X) + B1*T1 + B2*T3 + B3*T4 + B4*T4 + B5*T5 + B6*T6
X = 1 to 672 ] where Alpha and B1 to B6 are regression coefficients
In a week there are 672, 15 minute intervals = 7 days * 24 * hours * 60 minutes / 15 minutes = 672 intervals
  • So I want to feed Load, X, T1 to T6 using several months of data. In the sample file we have 8 weeks of data.
  • In 8 weeks we will have 8 instances/datapoints of Monday 00.00 to 00.15 am (X=1) and so on. These are to be used to estimate alpha at X = 1. Similarily for X = 2 till 672. This is just a sample set. If you try to find a regression coefficient Alpha for each X using 8 weeks of data since you have only 8 datapoints for each 15 minute interval or X you will likely overfit alpha. I am not sure of this ..just FYI
  • In 8 weeks of data, you will have so many more data points to estimate B1 to B6 since these have no time of week or X dependency.
  • The load curve over time will look roughly like the +ve half of a sine curve
its based on this paper - If anyone is interested you can look into it -
Again, Thank you all !
Matt J
Matt J on 25 Jul 2021
we have 8 weeks of data.
If the same parameters are to be used every week, then you can equivalently just average together Load data samples that were taken at the same time-of-week, reducing the fitting problem to just one week of data.
Load= mean( reshape(Load,672,[]) ,2);
Again, though, without further constraints on alpha, it is a trivial result. Just set all the B variables to zero and alpha(X)=Load.

Sign in to comment.

Answers (2)

Scott MacKenzie
Scott MacKenzie on 25 Jul 2021
Edited: Scott MacKenzie on 25 Jul 2021
This is probably too simple to be correct, but I'll toss it out there anyway. Admittedly, I haven't considered anything you written about time intervals, and such, because I think this is already present in the time variable, but I might be wrong.
Bottom line: You've got empirical data for eight variables (load, X or time, T1, T2, T3, T4, T5, and T6) and you want to build a model with one of the variables as the response variable and the other seven as predictors. Here's your model:
load = alpha*X+ b1*T1 + b2*T2 + b3*T3 + b4*T4 + b5*T5 + b6*T6
The script below generates a regression model using mvregress (with requires the Statistics and Machine Learning Toolbox):
f = '';
T = readtable(f);
% dependent/response variable
X = T.load;
% predictor variables (Note: time is 'X' in the question)
Y = [T.time, T.t1, T.t2, T.t3, T.t4, T.t5, T.t6];
format longg;
beta = mvregress(X,Y)
beta = 1×7
4.85059311004729e-06 9.23431675549545e-05 4.65695454655777e-05 3.94649009791285e-05 2.61216493640209e-05 4.93472971474542e-06 1.84313279577997e-06
The seven model coefficients (alpha, b1, b2, etc.) are above. Visit the documentation for mvregress for other options you might want to explore. Good luck.
Roop_T on 26 Jul 2021
True but not entirely. No.of data points available for predicting alpha = No of weeks of data you have. I have presented a subset here. I have about a year's worth of data so thats 52 weeks or 52 data points to predict alpha at each interval X. But, you can use data from all Xfor B1 to B6. The problem comes from the fact for alpha at X =1 you are using load data where X = 1 but for B1 to B6 you are using all the data but you want to predict both of them simultaneously.. which is the underlying programming complexity in this problem.

Sign in to comment.

the cyclist
the cyclist on 25 Jul 2021
It's definitely an interesting modeling problem. Here is a plot of your data, where I used errorbar to plot the mean and error of the mean.
chillerData = readtable('');
chillerData = chillerData(1:6048,:); % Only doing this step out of laziness, to get a multiple of 672
chillerLoad = chillerData.load;
chillerLoad = reshape(chillerLoad,9,672);
This does look close to sinusoidal (but I don't think only the positive portion?), so I think my first pass at a model would be one that varies sinusoidally in your X variable (scaled so that one cycle is 24 hours). And of course include the other terms.
I would not recommended doing averaging over the days, because you will then lose the ability of estimate the error. Just include all the data.
I would use fitnlm to do the fit. I can give more guidance on fitting the model if you need it.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by