How to preallocate memory for building this structure, indexing fieldnames?
Mostrar comentarios más antiguos
I have in several files a structure called "Result" and would like to merge all of them into one structure. My difficulty is, that the fieldnames following right after "Result." are build by a string identifying an experiment name, and as this experiment name and the amount of experiment names are unknown to this moment, I have to address them by indexing.
So far this indexing works, it merges my data correctly, but preallocation of memory is missing:
START HERE A LOOP THROUGH MANY FILES, RETRIEVING THE NEXT ID
NewData = load(id); % the file referenced in id contains a structure called "Result"
casename = fieldnames(NewData.Result);
cases = size(casename,1);
% preallocation of memory could fit in here, in this line
for caseIndex = 1:cases
Result.(casename{caseIndex}).MyValue = ...
NewData.Result.casename{caseIndex}).MyValue;
end
END HERE THE LOOP THROUGH MANY FILES
Now I tried to preallocate memory by the following failing attempt:
Result.(casename{1:cases}).MyValue = zeros(cases,1);
This one also failed:
Result.(casename{[1:cases]}).MyValue = zeros(cases,1);
Do you have any idea how the correct syntax has to look like?
2 comentarios
James Tursa
el 9 de Mzo. de 2015
How many files are you talking about? Are the case names in each file unique, or is there potential overlap of names amongst files? There may be a way to do some meaningful pre-allocation for your proposed struct organization, but are we talking about a Result struct with 100's or 1000's (or more) of field names?
Respuesta aceptada
Más respuestas (1)
Adam
el 9 de Mzo. de 2015
1 voto
Why do you need to pre-allocate? Aren't you simply copying values from one struct to another without any dynamic resizing going on of any individual field of the new struct? I don't see that pre-allocating zeros and then over-writing them with the same size of your actual data will gain you anything.
10 comentarios
Marco
el 9 de Mzo. de 2015
Adam
el 9 de Mzo. de 2015
From my understanding of your code (admittedly I have only glanced over it) you are dynamically creating a new field of your structure each iteration of the for loop though, not dynamically expanding an array within and existing structure field.
In this case I don't see a need for any pre-allocating as mentioned above. Dynamically created fields don't require presizing when you create the struct (and they can't be since a field can contain anything).
You could try to create the struct upfront with all its fields already containing pre-allocated arrays, but as mentioned this is un-necessary and slower rather than faster if you are simply going to copy data over the top of those pre-sized arrays anyway.
The problem is though that a field can contain anything so memory cannot be pre-allocated anyway based only on the fact that a field with some name will exist.
Dynamically expanding a structure with new fields is, as far as I am aware, not a problem (though Stephen's answer gives you plenty of more solid content on that and I too am a little unsure as to what Loren Shure's quote that you queried means so I may be wrong).
Does your code have a problem with speed or memory usage or something else that is causing you to feel you need to do this?
Marco
el 9 de Mzo. de 2015
One valuable piece of advice across any programming language and project is not to try to pre-optimise code.
Obviously where you know one method is faster than another and it does not take any more effort to use the faster version (e.g. vectorised code instead of for loops) then clearly you should do this. Sometimes you may also want to try to speed up a bit of code purely as a learning process. This is also fine, but is still subject to the following advice:
Before you start doing any code optimisation:
- Determine whether your code actually needs optimising (is it running too slowly or are you just optimising it because you think it can be optimised therefore you assume it should be?)
- Use the Matlab profiler on your code to tell you which parts of the code are the bottlenecks. It is very easy to use compared to profilers for many other languages. There is no point spending ages trying to speed up a piece of code if that piece of code actually only contributes to 1% of your program's total time. Even if you could speed it up to be instantaneous your overall program still won't improve its speed by more than that 1%
One thing I do quite often is to create quick test scripts comparing different ways of achieving the same thing. I wrap up each of these with the timeit function
doc timeit
and then decide which I should use, having already determined if the particular piece of code needs speeding up at all of course.
This can be very useful when you are considering, for example, comparing bsxfun to using reshape or a standard for loop or whether to use arrayfun rather than a for loop. If nothing else it furthers your understanding of these constructs and where they are useful.
Instead of nesting structures for each experiment like this:
Result.(casename{caseIndex}).MyValue
why not just create a non-scalar structure like this:
Result(caseIndex).MyValue
which gives you access to use lots of neat inbuilt tools and functions, and would probably be a lot easier!
Marco
el 9 de Mzo. de 2015
Adam
el 9 de Mzo. de 2015
Stephen's answer is the more complete so the right one to accept, but if you gained something useful from my answer too then that is good :)
James Tursa
el 9 de Mzo. de 2015
Editada: James Tursa
el 9 de Mzo. de 2015
Some clarification about comments above:
"... Dynamically created fields don't require presizing when you create the struct (and they can't be since a field can contain anything)."
Assuming we are only talking about the field names here (not the field elements themselved). While they don't require pre-allocation, there is a benefit. The amount of benefit depends on the number of fields to be added. Adding field names dynamically (e.g. in a loop) causes MATLAB to re-allocate memory for the field names and add more value addresses iteratively as well ... it is the equivalent of assigning to a cell array index in a loop without pre-allocating the cell array first (cells and structs are stored very similarly internally). Since you are only copying field variable addresses each iteration the copying overhead isn't likely to be much, but it is extra overhead that could potentially be avoided (if one knows all the field names up front).
"... You could try to create the struct upfront with all its fields already containing pre-allocated arrays, but as mentioned this is un-necessary and slower rather than faster if you are simply going to copy data over the top of those pre-sized arrays anyway."
Yes and no. If one is talking only about creating a struct with the proper field names up front, then pre-allocation does make sense and will be faster ... although the overhead savings could be quite small and negligible depending on the number of fields in question (and in fact the extra code to do this may wipe out the small savings altogether). If one is talking about pre-allocating the field elements themselves with variables (e.g., zeros), then this doesn't typically make sense as the references discuss (they get overwritten downstream anyway so the pre-allocation can be a waste of time and resources).
DISCLAIMER: I add these comments for clarification only. The fact is I am in agreement with others who have already posted that there are better ways to organize the data for easier and more efficient access (using dynamic field names in code is notoriously slow and limits how you can access and manipulate the data).
Categorías
Más información sobre Language Support en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!