Use a function that returns multiple values as input argument to another function

Hi,
I have a function CutOffForPctile as given below wherein the input argument "InData" is a column in a dataset/table and Pctile is a scalar value between 0 and 100. This i used to calculate the mean and standard deviation of data points till a specific percentile
function [CutOffMean, CutOffStd] = CutOffForPctile(InData, Pctile)
CutOff = prctile(InData,Pctile);
CutOffMean = mean(InData(InData<=CutOff));
CutOffStd = std(InData(InData<=CutOff));
end
I use the output of the above function as input in grpstats function like below. In the below C is a dataset and Im trying to find mean and std of data points till the 75th percentile within the defined group.
Cstats_Mod = grpstats(C,{'OptDesPt1', 'OptDesPt2'}...
,{'min','mean','max', @(C)CutOffForPctile(C,75),},'DataVars','DEffBasedOnMtlbFun');
My issue is that in the Cstats_Mod I have only one set of values (mean) from the function CutOffForPctile and it doesnt have standard deviation in the output.How can I make it return both
How can I return both mean and std as input argument to grpstats. Note, I dont want to make two seperate functions as I have a fairly big data set and finding the percentile cutoff is a costly process so would like to use it in one go for finding the mean and sd.
Any ideas?
Thanks Hari

 Respuesta aceptada

dpb
dpb el 25 de Feb. de 2015
Editada: dpb el 25 de Feb. de 2015
As a reference, Matlab functions can only return a single output variable; the alternate return syntax is simply not supported.
Call the function and save the variables in another variable first before calling grpstats is the only option.
BTW, in your function, I'd compute a logical vector and use it instead of doing the test twice...
function [CutOffMean, CutOffStd] = CutOffForPctile(InData, Pctile)
CutOff = prctile(InData,Pctile);
idx=InData<=CutOff;
CutOffMean = mean(InData(idx));
CutOffStd = std(InData(idx));
end
Not sure if the JIT optimizer can find the common expression or not...
ADDENDUM
OK, I admit my lack of familiarity with grpstats led me to not fully consider the use of the function handle; I was thinking one could simply substitute a set of values; clearly that isn't so, agreed...
Checking with the documentation for grpstats, a valid function to use a function handle for additional statistics in grpstats must return either a column vector or an array of nvals-by- ncols. So, I think you need to rewrite your function as
function [CutOffStats] = CutOffForPctile(InData, Pctile)
CutOff = prctile(InData,Pctile);
idx=InData<=CutOff;
CutOffstats = [mean(InData(idx));
std(InData(idx))];
end
You'll need to ensure proper orientation of InData, of course, so that variables are by column and the statistics are row vectors.

5 comentarios

dpb
dpb el 25 de Feb. de 2015
Editada: dpb el 25 de Feb. de 2015
As a procedural note, please use the "comments" button for conversation; retain the "Answer" box for actual answers to the question...
See updated answer for the solution (I think; untested) to your quandary...
Thanks for the neat trick. I never thought that function output could have mean and Std bunched together like this. I was thinking of sending the output as a concatenated string and later splitting them out similar to the "Text to Columns" feature in excel but what you have suggested is much more elegant. I'm also surprised by your observation/reading of grpstats statistics to return a col vector etc. Hadnt paid close attention to this.
I have one final request. The data that gets returned to Cstats_Mod in this case has its 6th column (based on output from mean/std function handle) bunched as one column even though visually I can see it as 2 columns.
For example,
Cstats_Mod(1,6)
ans =
Fun5_DEffBasedOnMtlbFun
1.4024 0.32011
size(Cstats_Mod.Fun4_DEffBasedOnMtlbFun)
ans =
4950 2
On the other hand, when I query first row of the 6th column it returns only the initial value as seen below
Cstats_Mod.Fun4_DEffBasedOnMtlbFun(1)
ans =
1.4024
How do I split the 6th column in to two different columns.I guess the 6th column is actually a cell array (is that true?) but not able to see a clean method.
Right now, am using double functionon Cstats_Mod and then again converting back to dataset.
Thanks Hari
PS: I used the "Answers" box by mistake. I was typing my response using a mobile browser and didnt realize which box I was typing in to
dpb
dpb el 26 de Feb. de 2015
Editada: dpb el 27 de Feb. de 2015
It's standard Matlab syntax/style to "smoosh" stuff of consistent size into a single array. As for the documentation, there's a lesson there--always study it thoroughly, especially those little details in the discussion of inputs/outputs.(*)
As for the question on the output, no, it's not a cell array, it's a dataset array. Try
whos Cstats_Mod
and then dig down within it same way to explore.
A one-element address of (1) is a linear address into a 2D array. Since Matlab is column-major storage order, the second element would be the second row, first column while the first row, second column (first std dev value) would be the (N+1) th where N is number of rows.
To get the first and/or second column use the array notation (:) for the rows with the second address of 1 or 2.
Cstats_Mod.Fun4_DEffBasedOnMtlbFun(:,1)
If it were my code, I'd likely not separate into two variables but use shorter names and keep the indexing expressions. Sometimes for clarity I'll define an integer variable as
SD=2; % column index for std dev
and then use it so can see at a glance later on which it is that is being referred to when may have forgotten the order or somesuch.
(*) Warning--geezer tale follows; ignore if wish... :) On the use of documentation--I started in the days of the mainframe, punched cards and massive input card decks of four and five full boxes of cards as input to large nuclear design codes. The input preparation for these codes was intricate and often required a fair amount of ingenuity to code such that the models implemented actually were representative of the physical problem to be solved. The ability to read the documentation in depth and glean all the nuances intended by the author thereof was a critical piece in learning to be able to use the codes effectively. As time passed and with the advent of the personal computer and visual tools, it became very obvious that newer graduates simply had not developed the skillset to be able to understand how to use these tools without extensive training in using the documentation. Learning to use the documentation and to look for the details other than just the top level is a key element in developing real expertise in any tool, particularly a complex and feature-rich one such as Matlab.
That you got as far as you did is a strong plus...
Thanks you for the detailed explanation.Very helpful.
I see your point regarding documentation. SAS was my best friend for more than a decade when I was working in the professional world and now after coming to academics for further studies I have to use Matlab. During the early years of learning SAS I used to have lot of enthusiasm for trying to learn the nooks and corners, but over the years the attention shifted away from tools/programming to analysis of data itself (Stats/OR).
My take on "newer graduates" is that, what were sub-fields or small areas at one point of time have now become stand-alone subjects and with rapid advancements happening more and more of that will happen in years to come; so a committed individual (who is ready to slog in) end up making choices on where/what to focus on considering their own strengths/aspirations etc. There are so many interesting things to learn that I keep negotiating with myself not to get side-tracked, still it is an on-going struggle.
Btw, I never thought that after working with SAS I would ever like any other software for data analysis..With Matlab I have been pleasantly surprised and this will definitely be a long term friend..
dpb
dpb el 27 de Feb. de 2015
Editada: dpb el 27 de Feb. de 2015
Indeed, I agree w/ the comments re: the field explosion; it's real, for sure. The key point I was trying to make in response to your comment on being surprised I gleaned the form need is that it's important to get to the end objective which is, I agree, the actual results of the analysis, not the code per se; in the end Matlab (or SAS or whatever) is just a glorified pencil/calculator. In order to do that expeditiously, one needs to learn to use all the facilities that are provided and these details in the documentation are some of the most key to not overlook (yet almost every question asked here has an answer obtainable by such study if the poster would really read and study such).(*)
Sounds like we may have had a lot in common; I started in the reactor engineering area; discovered statistics in the process of working with incore instrumentation systems and evolved into consulting with an emphasis towards utilizing probabilistic tools for engineering problems not amenable otherwise.
SAS was also a constant companion for years; you're fortunate to be coming to Matlab at the present time; until quite recently the features for such were quite weak in comparison. There are still issues with the integration of things into a comprehensive package but it's much improved, indeed. Matlab is somewhat more flexible for exploratory computational work; SAS still has some advantages for packaged standard data analyes in my view as it was, from the gir-go, intended for and implemented such whereas TMW has had to try to graft that on top of the general programming matrix language and try to keep the open nature as well. It's a tough mix to do well...
(*) Another sidebar-- :) I've complained over the years to TMW that particularly for the base language the documentation, while extensive, is lacking in that there is no definitive definition of the details of syntax but it is all written as narrative/by example instead. This does lead to areas in which there are "holes" or ambiguities or just oversights. I believe the idea was always to try to make it more accessible and therefore "easy" as compared to standard languages such as it's initial pattern, Fortran, and I understand the intent from that standpoint. But, I also think it has evolved over the years to be somewhat deliberate to retain flexibility and what is deemed proprietary knowledge from public release.

Iniciar sesión para comentar.

Más respuestas (1)

Hi, Can you kindly provide an example of what you are proposing. I'm not able to make it work as I need to find the mean/std for specific levels of the grouping variable's value as indicated in my grpstats code.
I have almost 5000 unique grouping levels in my data set and not sure whether am required to do the saving process that you recommended for each of those levels in advance.
PS: thanks for the tip on index, idx.
Thanks Hari

Categorías

Productos

Etiquetas

Aún no se han introducido etiquetas.

Preguntada:

el 25 de Feb. de 2015

Editada:

dpb
el 27 de Feb. de 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by