Is it possible to avoid copy-on-write behavior in functions yet?
Mostrar comentarios más antiguos
As I understand, MATLAB has used a system called 'copy-on-write' for function calls. So if you have a function of the form
function [out] = myfunction(out,in1,in2)
in1 = rand(1);
in2 = rand(1);
out = in1+in2;
MATLAB will create a new space in memory for a new copy of variables out, in1, and in2, perform the given operations on these arrays, and then copy modified arrays onto the old variable memory space if it is an output variable. This will also occur for the variable 'out', and will even occur for in1 and in2 if written as
function [out,in1,in2] = myfunction(out,in1,in2)
in1 = rand(1);
in2 = rand(1);
out = in1+in2;
Obviously, this behavior wastes time if you know that the old variable should be replaced by the new variable. I have long avoided using functions for this reason, resulting in messy code.
Is it possible to pass variables to functions by reference? If no, will this be possible in a future MATLAB?
EDIT:
A commenter noted that since the inputs in1 and in2 are defined in the function they do not need to be passed through the function. Perhaps the following better describes the problem:
function [out,ind1,ind2] = myfunction(out,in1,in2)
ind = 5;
in1(ind) = rand(1);
in2(ind) = rand(1);
out = in1+in2;
so the function modifies one element of each of these arrays, although the entire variables are copied before being modified.
12 comentarios
KSSV
el 3 de Oct. de 2017
function [out] = myfunction(out,in1,in2)
in1 = rand(1);
in2 = rand(1);
The above statement is not required..you need not to send in1 and in2 to the function as you are creating them inside function.
function [out] = myfunction(out)
in1 = rand(1);
in2 = rand(1);
The above is enough.
OCDER
el 3 de Oct. de 2017
Here's a good discussion about this. I would say use functions. Compartmentalizing tasks by functions can save you hours debugging and even improve memory usage as temporary variables are not saved (you don't have to clear them, which is slow). Readability is VERY important - what use is saving 2 ms if you spend 20 hours debugging "messy" code? That's 20 hrs you could be hanging out with friends instead of looking for the error.
You can run many functions just as fast as one function with modern computers, if structured correctly. If this weren't true, complex software like MATLAB would be a very slow program...
Matt J
el 3 de Oct. de 2017
Note that "passing by reference" is not the same as "avoiding copy-on-write". Copy-on-write means (among other things) that if you pass a variable to a function which doesn't modify the variable, then it will be passed by reference.
For example, in the version of your example below, in1 and in2 are passed by reference. No copies are made.
function [out,in1,in2] = myfunction(in1,in2)
out = in1+in2;
" I have long avoided using functions for this reason, resulting in messy code."
That was a really bad design decision. You are going to waste a lot more time writing/testing/fixing/... messy code, compared to if you had used testable, neatly encapsulated functions. Scripts are fun for playing around with, but any reliable, repeatable, testable, efficient code should be written using functions or classes. If you read this forum and the MATLAB help you will find plenty of discussions and reasons for using functions, including that they are faster than scripts. They are certainly less buggy, easier to test, and easier to maintain.
"MATLAB will create a new space in memory for a new copy of variables out, in1, and in2,..."
Quite unlikely. You never write to the input variables inside the functions, so no copying will occur. Simply passing an argument to a function does not cause copying of that variable, only when it is written to does the variable get copied. Although you never write to them, you do allocate new variables which coincidentally have the same names and will require their own memory, but this is unrelated to the topic at hand.
This is why premature optimization is a bad idea, because it leads people to write messy, unclear, impossible-to-maintain code. It is also a good example of why code should be designed to be clear and readable, rather than designed based on some esoteric ideas of efficiency: good code practices will always be correct, no matter what MATLAB version, and can be identified by the JIT compiler and optimized internally.
Unless I am mistaken copy-on-write is used just the same for variables in scripts:
a = rand( 1000 );
b = a;
should not use extra memory (beyond the tiny amount for a reference). Only if you then change something in 'b':
b(1) = 27;
will it then make a copy of the data. So if you take a copy of something and don't then edit it it will not use more memory, but if you do edit it it will. Whether it goes through a function or not should not matter as far as I am aware.
I may be mistaken though. I only really care about these things when I am optimising with large inputs and then I am always using functions and classes anyway.
James Tursa
el 3 de Oct. de 2017
@Adam: Correct. The general behavior for passing arguments to m-file and p-code functions is: Shared data copies of all arguments are created and it is those shared data copies that are actually passed into the function. Then the normal copy-on-write rules apply just as they would at any other point in the code.
The exceptions are:
(1) In some circumstances deep copies of scalars are passed instead of shared data copies.
(2) In special syntax cases where the output variable has the same name as the input variable and the function is called from within another function in a "make-changes-in-place" manner. See Loren's Blog on this for more info.
(3) Classdef variables derived from Handle behave differently.
(4) Arguments to mex routines are passed by reference.
James Tursa
el 3 de Oct. de 2017
@Christopher: In your latest edit, what is the issue? That deep copies are made of in1 and in2, even though only one element changed, and then that these deep copies are thrown away after the function returns? You want a way for this to happen in-place? Did you mean for your output arguments to be in1 and in2 instead of ind1 and ind2? Or ...?
Guillaume
el 4 de Oct. de 2017
With regards to the latest edit, and assuming that ind1 and ind2 are meant to be in1 and in2 then, according to Loren's blog linked in Jan's answer, no copy is made.
But as I said in my answer, whether or not it does should way down the list of priorities until it's been proven to be a bottleneck.
James Tursa
el 4 de Oct. de 2017
"... no copy is made ..."
Sort of. No copy is made IF this function is called from within another function, and IF the calling routine uses syntax where the input and output variables match, and IF the original variables are not shared data copies of something else to begin with. If any of those conditions is not met, then a data copy will be made.
Alec
el 13 de Jun. de 2025
What does "IF the original variables are not shared data copies of something else to begin with." mean?
Benjamin Kraus
el 13 de Jun. de 2025
@Alec: Consider this situation:
a = rand(100);
a = somefunction(a);
In the scenario above, because a is being overwritten by the output from somefunction, then somefunction can reuse the memory allocated by a and (depending on the implementation of somefunction) may be able to avoid making a copy of a to pass into the function.
Now consider this:
a = rand(100);
b = a;
a = somefunction(a);
The second line of code creates the a "shared data copy" of a and stores it in b. No new memory is allocated, and both a and b have the same value, and MATLAB knows this, so both a and b are sharing the same memory. However, this means that somefunction can no longer overwrite that memory, because it is being shared by b. If somefunction changed the value of a, this shouldn't be reflected in the value of b.
Now consider this:
a = rand(100);
b = a;
b(1) = 1; % This triggers the "copy-on-write" mechanism.
a = somefunction(a);
The second line of code creates a "shared data copy" of a, but then the third line of code modifies b. This means that a and b can no longer share the same storage. This forces MATLAB to create a duplicate copy of a and then update it for storage in b. Once you've done this, a is no longer being shared with another variable, so somefunction can go back to reusing the storage for a.
Respuesta aceptada
Más respuestas (2)
See also Loren's very useful article: https://blogs.mathworks.com/loren/2007/03/22/in-place-operations-on-data/ .
You are right: When the algorithm is very efficient and processed on a multi-core machine, the memory copies can become the bottleneck. I had the same problem in an optimization tool written in C, which called a FORTRAN library for solving a huge matrix equation with a known pattern. The two deep data copies when entering and leaving the library took 40% of the total run time. Fortunately we had the FORTRAN source code and modify it to process the matrices in-place.
But now imagine we had avoided to use functions at first. As you wrote, the code would have been too messy to optimize it.
You can avoid deep data copies sometimes:
x = zeros(10000, 10000);
n = 1e6;
tic;
for k = 1:n
x = addInSubFcn(x);
end
toc
tic;
for k = 1:n
[xx, index] = addInCaller(x);
x(index) = xx;
end
toc
function x = addInSubFcn(x)
index = randi(numel(x));
x(index) = x(index) + rand;
function [xx, index] = addInCaller(x)
index = randi(numel(x));
xx = x(index) + rand;
R2016b/64, Win7:
Elapsed time is 2.583763 seconds. % In subfunction
Elapsed time is 1.884192 seconds. % In caller
Keep this in mind, when you create functions to modify arrays.
1 comentario
Tyler Warner
el 24 de Mayo de 2018
Excellent response. Very insightful!
I agree with most of what is said in the comments/answers. Yet, if you really needed to avoid copies for good reasons in a context far more complex and/or specific than the example that you give, you could create a handle class and always work on a single copy of whatever you pass to functions/methods.
Again, there is no point in doing this for simple data structures unless you have proven that you cannot afford the copy-on-write, so don't jump on this solution if you don't fully understand what you are doing.
Yet, skimming the history of your questions, I think that you know what you are doing and that people reacted to your comment about "not using functions and getting messy code for avoiding copy-on-write" a bit too quickly .. but you have to admit that in most cases this is almost a heretical statement/approach ;-)
Anyhow, assuming that you need this for valid reasons, here is an example:
classdef VeryVeryLargeArray < handle
properties
array
end
methods
function obj = VeryVeryLargeArray( builder, varargin )
obj.array = builder( varargin{:} ) ;
end
% Possibly some overload of e.g. SUBSREF/SUBSASGN/SIZE and operators.
end
end
Using it for building e.g. a 5GB random array (so you can see something in the task manager):
>> n = floor( sqrt( 5e9/8 )) ;
>> vvla = VeryVeryLargeArray( @rand, n ) ;
you see a 5GB jump in the memory usage. Now if you call a function e.g. setRow :
function setRow( vvla, rowId, value )
vvla.array(rowId,:) = value ;
end
after having set a break point on the 3rd line with end:
setRow( vvla, 1, 0 ) ;
you won't see a second jump due to a copy-on-write and your array will have been updated (even in the base workspace, because handles work "a bit like pointers").
EDIT 10/4 @ 12:41UTC: I am just giving you a quick example of overload of SUBSREF in case you wanted to transfer block indexing of the object(s) to the internal array(s):
function out = subsref( obj, S )
if S(1).type(1) ~= '.'
out = subsref( obj.array, S ) ;
else
out = builtin( 'subsref', obj, S ) ;
end
end
This method could be added after my comment in the methods block of the class definition. The same would have to be done for SUBSASGN and possibly SIZE. The advantage is that most functions could operate on the object the way they operate on any numeric array:
>> vvla(2:4, 10:13)
ans =
0.5108 0.1707 0.3188 0.3955
0.8176 0.2277 0.4242 0.3674
0.7948 0.4357 0.5079 0.9880
This accesses vvla.array(2:4,10:13) and has the advantage to make the internal structure transparent to the user (at least for what is managed by SUBSREF).
Note that testing S(1).type(1)~='.' (and not just S(1).type(1)=='(') allows to transfer any () or {} indexing to the array property, so you can use builders of cell arrays:
vvlca = VeryVeryLargeArray( @cell, 4, 5 ) ;
BUT you cannot easily (or at all) manage properly CSL outputs (especially when you want to nest these objects), so there is a limit to what you can achieve with overloading indexing methods. [If you try, you will likely spend hours wondering why nargout is defined through a call to your overloaded NUMEL and not to the builtin, and trying to find workarounds.]
EDIT 10/5 @ 12:32UTC: As mentioned, you can overload specific operations or functions that are relevant to the use that you make of these arrays. If you want to be able to use DIFF transparently for example:
function df = diff( obj, varargin )
df = diff( obj.array, varargin{:} ) ;
end
Categorías
Más información sobre Workspace Variables and MAT Files en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!