sprintf vs. compose performance for large arrays on macOS
Mostrar comentarios más antiguos
I would expect sprintf and compose to exhibit similar performance, yet on my Mac, sprintf is considerably faster. Consider the following example.
M = 32;
N = 32;
A = randn(M,N);
[m,n] = ndgrid(1:M,1:N);
Nrep = 100;
%% compose and no for loops
t1 = zeros(Nrep,1);
for kr=1:Nrep
tic;
s1 = compose("%d,%d,%.3g",m(:),n(:),A(:));
s1 = reshape(s1,M,N);
t1(kr) = toc;
end
%% sprintf and two nested for loops
t2 = zeros(Nrep,1);
for kr=1:Nrep
s2 = repelem("",M,N);
tic;
for j=1:N
for i=1:M
s2(i,j) = sprintf("%d,%d,%.3g",i,j,A(i,j));
end
end
t2(kr) = toc;
end
%% compose and two nested for loops
t3 = zeros(Nrep,1);
for kr=1:Nrep
tic;
s3 = repelem("",M,N);
for j=1:N
for i=1:M
s3(i,j) = compose("%d,%d,%.3g",i,j,A(i,j));
end
end
t3(kr) = toc;
end
fprintf(" min | mean | max\n");
fprintf("compose, no loops: %.6f | %.6f | %.6f\n",min(t1),mean(t1),max(t1));
fprintf("sprintf, 2 loops: %.6f | %.6f | %.6f\n",min(t2),mean(t2),max(t2));
fprintf("compose, 2 loops: %.6f | %.6f | %.6f\n",min(t3),mean(t3),max(t3));
This code produces the following output on my machine.
min | mean | max
compose, no loops: 1.008151 | 1.035783 | 1.076671
sprintf, 2 loops: 0.004265 | 0.004384 | 0.008990
compose, 2 loops: 1.018900 | 1.050923 | 1.464713
It seems that sprintf is 60x - 70x faster in this example. Any idea why that is?
Here is the output of ver on my machine:
-----------------------------------------------------------------------------------------
MATLAB Version: 25.1.0.2943329 (R2025a)
MATLAB License Number:
Operating System: macOS Version: 15.5 Build: 24F74
Java Version: Java 11.0.27+6-LTS with Amazon.com Inc. OpenJDK 64-Bit Server VM mixed mode
-----------------------------------------------------------------------------------------
Respuestas (2)
The execution below shows very different results than what's reported in the question. Note that I did some refactoring as to not overwrite the n and m variables.
Considerations:
- A single tic/toc measurments is usually not useful. There is considerable variation in execution time, especially at these sub-second durations. The more responsible way to time this is to include many repetitions and then to compare the distributions to see if they significantly differ.
- Note how much faster #2 below is compared to #1, both of which use compose. Reshaping is not necessary if you preallocate the output variable.
- #5 is the fastest
M = 32;
N = 32;
A = randn(M,N);
[m,n] = ndgrid(1:M,1:N);
%% #1 compose & reshape
tic;
s1 = compose("%d,%d,%.3g",m(:),n(:),A(:));
s1 = reshape(s1,M,N);
toc;
%% #2 compose using preallocation
tic;
s1b = strings(size(A));
s1b(:) = compose("%d,%d,%.3g",m(:),n(:),A(:));
toc
isequal(s1b, s1)
%% #3 sprintf and two nested for loops
s2 = strings(size(A));
tic;
for i=1:N
for j=1:N
s2(j,i) = sprintf("%d,%d,%.3g",j,i,A(j,i));
end
end
toc
isequal(s2, s1)
%% #4 compose and two nested for loops
s3 = strings(size(A));
tic;
for i=1:N
for j=1:N
s3(j,i) = compose("%d,%d,%.3g",j,i,A(j,i));
end
end
toc
isequal(s3, s1)
%% #5 String construction
tic
s4 = string(m) + "," + string(n) + "," + compose('%.3g',A);
toc
isequal(s4, s1)
version()
Proper measurement with repetition
I'll compare methods #2 and #5 above using a more robust measure of execution time. Each method is timed 1000x, storing the duration on each iteration. The two distributions are plotted to evaluate whether they are clearly distinct distribtions and to evaluate the variance.
clear % remove existing variables created above
rng default
N = 32;
A = randn(N);
[m,n] = ndgrid(1:N);
% Preallocate outputs before timing
repetitions = 1000;
duration = nan(repetitions, 2);
% method 2
for i = 1:repetitions
t0 = tic;
s1b = strings(size(A));
s1b(:) = compose("%d,%d,%.3g",m(:),n(:),A(:));
duration(i,1) = toc(t0);
end
% method 5
for i = 1:repetitions
t0 = tic;
s4 = string(m) + "," + string(n) + "," + compose('%.3g',A);
duration(i,2) = toc(t0);
end
% Sanity check
isequal(s1b, s4)
% Show distributions as boxcharts (alternative:histograms)
bxh = boxchart(duration);
xticklabels({'compose','string+string'});
ylabel('duration (sec)')
title(repetitions+" repetitions")
Clearly the string+string method is significantly more performant and less variant than compose.
meanDuration = mean(duration);
fprintf('%g times faster\n', max(meanDuration)/min(meanDuration))
16 comentarios
Jose
el 17 de Jun. de 2025
Here's a version where compose is not slower than sprintf. It uses preallocation which allows you to vectorize the inputs and avoids the loops.
The compose function is not based on sprintf. From the compose documentation page: "The compose function can return multiple pieces of formatted text as a string array or a cell array of character vectors, unlike sprintf. The sprintf function returns only a string scalar or a character vector." Thus, compose can often avoid for-loops in place of vectorized inputs.
By removing the loops and using the vectorized form, you reduce overhead and allow compose to run optimally.
I'd have to get into implementation details to offer a deeper comparison but the main takeaway is that these are separate functions with separate implementations and purposes.
To speed up your program while using compose, use the preallocated form. To improve performance futher, consider using method #5 from my answer.
M = 32;
N = 32;
A = randn(M,N);
[m,n] = ndgrid(1:M,1:N);
Nrep = 100;
t = nan(Nrep,3);
% sprintf in 2 loops
s2 = strings(size(A));
for kr=1:Nrep
t0 = tic;
for j=1:N
for i=1:M
s2(i,j) = sprintf("%d,%d,%.3g",i,j,A(i,j));
end
end
t(kr,1) = toc(t0);
end
% compose in 2 loops
s3 = strings(size(A));
for kr=1:Nrep
t0 = tic;
for j=1:N
for i=1:M
s3(i,j) = compose("%d,%d,%.3g",i,j,A(i,j));
end
end
t(kr,2) = toc(t0);
end
isequal(s2,s3)
% compose + preallocation, vectorized
s2b = strings(size(A));
for kr=1:Nrep
t0 = tic;
s2b(:) = compose("%d,%d,%.3g",m(:),n(:),A(:));
t(kr,3) = toc(t0);
end
isequal(s2, s2b)
boxchart(t)
xticklabels({'sprintf','compose','compose + preallocation'});
grid on
Walter Roberson
el 18 de Jun. de 2025
s2 = repelem("",M,N) is less efficient than
s2 = strings(M,N);
What I'm calling the preallocation form is the secton of code under the following comment.
% compose + preallocation, vectorized
In that section, the string array is created by a single line of code using vectorized inputs. Using vectorized inputs is usually faster and more efficient than computing iteratively within a loop.
> Is repelem("",M,N) not adequate preallocation?
repelem is fine for preallocating an array but the strings method in my version is optimal for preallocating an empty string array. The more important part is how that array is used (loop vs vectorized inputs).
> Could [the difference] be a macOS issue?
It could be, altough I do not have access to a mac right now to verify those results.
I just repeated the measurments using R2025a version 25.1.0.2943329 which matches the version you mentioned in your question and the results showed the same conclusion as what I reported earlier (Windows OS). I also repeated it in R2024b Update 3 which resulted in very similar results. MATLAB in this forum uses linux which produced the results I created above.
isunix
ismac
Note that in the section "compose and two nested for loops" in your question, repelem is called within tic/toc but for the sprintf section above it, repelem is called outside of tic/toc which is the right choice since you're focusing on sprintf/compose performance, not compose + repelem.
Walter Roberson
el 18 de Jun. de 2025
On my Intel Mac system,
R2024b

R2025a

The compose() part is quite slow either way, but it is about 1.5 for R2024b and about 1.4 for R2025a.
Adam Danz
el 18 de Jun. de 2025
This was a really enlightening discussion. I'll share internally.
Jose
el 18 de Jun. de 2025
Jose
el 18 de Jun. de 2025
Walter Roberson
el 18 de Jun. de 2025
Because the post has at least one Answer, you would be unable to delete the post.
It is better to leave the post up, as other people might encounter the same situation.
Jose
el 18 de Jun. de 2025
Jose
el 18 de Jun. de 2025
goc3
el 13 de Ag. de 2025
I just found this thread and am quite interested in it. I have an Intel Mac Pro, so I figured it should perform fairly well, but it was even slower than both results posted above for other Mac configurations. The first figure (in light) is for R2024b while the second (in dark) is for R2025a. The two median values that include compose are 2.052 and 2.045 (essentially the same) for R2024b and 1.896 and 1.843 for R2025a.


I have always sensed that the Mac version of MATLAB did not receive all the optimization attention of the Windows version (especially in graphics, at least, before the New Desktop). But, I was not able to participate in a direct performance comparison across OS's and provide data for it before.
For clarity, even though both releases were simultaneously open, I ran the script in each version one at time. My computer has a 16-core 3.2 GHz processor and 160 GB of RAM (only 36 GB used during the tests).
This is rather frustrating, as my results are quite a bit slower than the other two sets of results and the expected benefit of preallocation (which works so well in so many contexts) simply does not work for Macs in this situation. I hope that this gets fixed soon and that the Mac version of MATLAB gets more attention. (Perhaps that is already the case for the Apple Silicon version since Intel Macs are no longer being sold...)
J. Alex Lee
el 13 de Ag. de 2025
Editada: J. Alex Lee
el 13 de Ag. de 2025
I landed here not because of OS, but also about performance of compose...what fascinates me is the bit where calling multiple compose()'s and +'ing them together is much faster than one shot compose.
In my application I have data with order hundred thousand rows from a db, and I need to construct a human-readable id from several fields.
Profiling showed my bottleneck to be the compose'ing (method 1 below). I played aroun and found that sprintf is much faster, but I couldn't get it to output a vector of strings (only can do 1 string with line breaks), hence landing on this post.
Now I find that I can be counterintuitively inefficient with calls to compose and will speed up my code (~26x faster on my computer with relevant number of rows, but showing only 15x faster on the web platform here).
Adding a route using "join" instead of "+" is similarly faster.
NReps = 10;
NData = 100000;
prfx1 = char(randi(26,[NData,1])+64);
prfx2 = char(randi(26,[NData,1])+64);
prfx3 = char(randi(26,[NData,1])+64);
sufx1 = randi(999,[NData,1]);
sufx2 = randi(999,[NData,1]);
tdurat = zeros(NReps,3);
% intuitive single call to compose
for k = 1:NReps
tic
CatCodeA = compose("%s%s%s%03d%03d",prfx1,prfx2,prfx3,sufx1,sufx2);
tdurat(k,1) = toc;
end
% unintuitive multiple composes, then +'d together
for k = 1:NReps
tic
CatCodeB = compose("%s",prfx1)+compose("%s",prfx2)+compose("%s",prfx3)+compose("%03d",sufx1)+compose("%03d",sufx2);
tdurat(k,2) = toc;
end
% multiple composes join()'d together
for k = 1:NReps
tic
CatCodeC = join([...
compose("%s",prfx1),...
compose("%s",prfx2),...
compose("%s",prfx3),...
compose("%03d",sufx1),...
compose("%03d",sufx2)],...
"" ...
);
tdurat(k,3) = toc;
end
% excerpt inputs
[cellstr(prfx1(1:10)),cellstr(prfx2(1:10)),cellstr(prfx3(1:10)),num2cell(sufx1(1:10)),num2cell(sufx2(1:10))]
% excerpt outputs
[CatCodeA(1:10),CatCodeB(1:10),CatCodeC(1:10)]
% sanity checks
isequal(CatCodeA,CatCodeB)
isequal(CatCodeB,CatCodeC)
% timers
tA = mean(tdurat(:,1));
tB = mean(tdurat(:,2));
tC = mean(tdurat(:,3));
fprintf([ ...
'Single compose() takes: %g sec\n' ...
'compose()+compose() takes: %g sec (~%.2f x speedup)\n' ...
'join of composes() takes: %g sec (~%.2f x speedup)\n'], ...
tA,tB,tA/tB,tC,tA/tC);
% boxchart
bxh = boxchart(tdurat);
xticklabels({'Single compose()','compose()+compose()','join of composes()'});
ylabel('duration (sec)')
My understanding is that compose is based on sprintf
That is not correct, at least not directly.
compose() selects elements "across" the optional data parameters; sprintf() selects elements "down" the optional data parameters.
sprintf('%d : %d', [1; 2], [3; 4])
compose('%d : %d', [1; 2], [3; 4])
3 comentarios
Jose
el 17 de Jun. de 2025
Walter Roberson
el 17 de Jun. de 2025
We do not know this to be true.
For example, for historical reasons, sprintf() might have been coded in C, but the newer compose() might be coded in C++.
Jose
el 18 de Jun. de 2025
Categorías
Más información sobre System Composer en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



