sprintf vs. compose performance for large arrays on macOS

I would expect sprintf and compose to exhibit similar performance, yet on my Mac, sprintf is considerably faster. Consider the following example.
M = 32;
N = 32;
A = randn(M,N);
[m,n] = ndgrid(1:M,1:N);
Nrep = 100;
%% compose and no for loops
t1 = zeros(Nrep,1);
for kr=1:Nrep
tic;
s1 = compose("%d,%d,%.3g",m(:),n(:),A(:));
s1 = reshape(s1,M,N);
t1(kr) = toc;
end
%% sprintf and two nested for loops
t2 = zeros(Nrep,1);
for kr=1:Nrep
s2 = repelem("",M,N);
tic;
for j=1:N
for i=1:M
s2(i,j) = sprintf("%d,%d,%.3g",i,j,A(i,j));
end
end
t2(kr) = toc;
end
%% compose and two nested for loops
t3 = zeros(Nrep,1);
for kr=1:Nrep
tic;
s3 = repelem("",M,N);
for j=1:N
for i=1:M
s3(i,j) = compose("%d,%d,%.3g",i,j,A(i,j));
end
end
t3(kr) = toc;
end
fprintf(" min | mean | max\n");
fprintf("compose, no loops: %.6f | %.6f | %.6f\n",min(t1),mean(t1),max(t1));
fprintf("sprintf, 2 loops: %.6f | %.6f | %.6f\n",min(t2),mean(t2),max(t2));
fprintf("compose, 2 loops: %.6f | %.6f | %.6f\n",min(t3),mean(t3),max(t3));
This code produces the following output on my machine.
min | mean | max
compose, no loops: 1.008151 | 1.035783 | 1.076671
sprintf, 2 loops: 0.004265 | 0.004384 | 0.008990
compose, 2 loops: 1.018900 | 1.050923 | 1.464713
It seems that sprintf is 60x - 70x faster in this example. Any idea why that is?
Here is the output of ver on my machine:
-----------------------------------------------------------------------------------------
MATLAB Version: 25.1.0.2943329 (R2025a)
MATLAB License Number:
Operating System: macOS Version: 15.5 Build: 24F74
Java Version: Java 11.0.27+6-LTS with Amazon.com Inc. OpenJDK 64-Bit Server VM mixed mode
-----------------------------------------------------------------------------------------

Respuestas (2)

Adam Danz
Adam Danz el 17 de Jun. de 2025
Editada: Adam Danz el 17 de Jun. de 2025
The execution below shows very different results than what's reported in the question. Note that I did some refactoring as to not overwrite the n and m variables.
Considerations:
  1. A single tic/toc measurments is usually not useful. There is considerable variation in execution time, especially at these sub-second durations. The more responsible way to time this is to include many repetitions and then to compare the distributions to see if they significantly differ.
  2. Note how much faster #2 below is compared to #1, both of which use compose. Reshaping is not necessary if you preallocate the output variable.
  3. #5 is the fastest
M = 32;
N = 32;
A = randn(M,N);
[m,n] = ndgrid(1:M,1:N);
%% #1 compose & reshape
tic;
s1 = compose("%d,%d,%.3g",m(:),n(:),A(:));
s1 = reshape(s1,M,N);
toc;
Elapsed time is 0.016196 seconds.
%% #2 compose using preallocation
tic;
s1b = strings(size(A));
s1b(:) = compose("%d,%d,%.3g",m(:),n(:),A(:));
toc
Elapsed time is 0.013167 seconds.
isequal(s1b, s1)
ans = logical
1
%% #3 sprintf and two nested for loops
s2 = strings(size(A));
tic;
for i=1:N
for j=1:N
s2(j,i) = sprintf("%d,%d,%.3g",j,i,A(j,i));
end
end
toc
Elapsed time is 0.013522 seconds.
isequal(s2, s1)
ans = logical
1
%% #4 compose and two nested for loops
s3 = strings(size(A));
tic;
for i=1:N
for j=1:N
s3(j,i) = compose("%d,%d,%.3g",j,i,A(j,i));
end
end
toc
Elapsed time is 0.028691 seconds.
isequal(s3, s1)
ans = logical
1
%% #5 String construction
tic
s4 = string(m) + "," + string(n) + "," + compose('%.3g',A);
toc
Elapsed time is 0.008618 seconds.
isequal(s4, s1)
ans = logical
1
version()
ans = '25.1.0.2952844 (R2025a)'
Proper measurement with repetition
I'll compare methods #2 and #5 above using a more robust measure of execution time. Each method is timed 1000x, storing the duration on each iteration. The two distributions are plotted to evaluate whether they are clearly distinct distribtions and to evaluate the variance.
clear % remove existing variables created above
rng default
N = 32;
A = randn(N);
[m,n] = ndgrid(1:N);
% Preallocate outputs before timing
repetitions = 1000;
duration = nan(repetitions, 2);
% method 2
for i = 1:repetitions
t0 = tic;
s1b = strings(size(A));
s1b(:) = compose("%d,%d,%.3g",m(:),n(:),A(:));
duration(i,1) = toc(t0);
end
% method 5
for i = 1:repetitions
t0 = tic;
s4 = string(m) + "," + string(n) + "," + compose('%.3g',A);
duration(i,2) = toc(t0);
end
% Sanity check
isequal(s1b, s4)
ans = logical
1
% Show distributions as boxcharts (alternative:histograms)
bxh = boxchart(duration);
xticklabels({'compose','string+string'});
ylabel('duration (sec)')
title(repetitions+" repetitions")
Clearly the string+string method is significantly more performant and less variant than compose.
meanDuration = mean(duration);
fprintf('%g times faster\n', max(meanDuration)/min(meanDuration))
9.38987 times faster

16 comentarios

Jose
Jose el 17 de Jun. de 2025
Editada: Jose el 17 de Jun. de 2025
My example was meant to convey this idea, without incurring more development effort on my end.
I agree that tic and toc will give you a lot of variance, but clearly sprintf is much faster than compose even when taking into account timing jitter, at least on my machine.
Would you happen to know why? I love the flexibility that compose gives you, but it made my program 80x slower (0.5s vs. 40s). It's a bit disappointing that this is the case.
I edited my question to include repetitions. I still observe the same discrepancy in performance.
Here's a version where compose is not slower than sprintf. It uses preallocation which allows you to vectorize the inputs and avoids the loops.
The compose function is not based on sprintf. From the compose documentation page: "The compose function can return multiple pieces of formatted text as a string array or a cell array of character vectors, unlike sprintf. The sprintf function returns only a string scalar or a character vector." Thus, compose can often avoid for-loops in place of vectorized inputs.
By removing the loops and using the vectorized form, you reduce overhead and allow compose to run optimally.
I'd have to get into implementation details to offer a deeper comparison but the main takeaway is that these are separate functions with separate implementations and purposes.
To speed up your program while using compose, use the preallocated form. To improve performance futher, consider using method #5 from my answer.
M = 32;
N = 32;
A = randn(M,N);
[m,n] = ndgrid(1:M,1:N);
Nrep = 100;
t = nan(Nrep,3);
% sprintf in 2 loops
s2 = strings(size(A));
for kr=1:Nrep
t0 = tic;
for j=1:N
for i=1:M
s2(i,j) = sprintf("%d,%d,%.3g",i,j,A(i,j));
end
end
t(kr,1) = toc(t0);
end
% compose in 2 loops
s3 = strings(size(A));
for kr=1:Nrep
t0 = tic;
for j=1:N
for i=1:M
s3(i,j) = compose("%d,%d,%.3g",i,j,A(i,j));
end
end
t(kr,2) = toc(t0);
end
isequal(s2,s3)
ans = logical
1
% compose + preallocation, vectorized
s2b = strings(size(A));
for kr=1:Nrep
t0 = tic;
s2b(:) = compose("%d,%d,%.3g",m(:),n(:),A(:));
t(kr,3) = toc(t0);
end
isequal(s2, s2b)
ans = logical
1
boxchart(t)
xticklabels({'sprintf','compose','compose + preallocation'});
grid on
Jose
Jose el 18 de Jun. de 2025
Editada: Jose el 18 de Jun. de 2025
I did use the preallocated form! Is repelem("",M,N) not adequate preallocation?
Jose
Jose el 18 de Jun. de 2025
Editada: Jose el 18 de Jun. de 2025
I ran your code on my machine. I get drastically different results. sprintf is still way better. Could it be a macOS issue?
s2 = repelem("",M,N) is less efficient than
s2 = strings(M,N);
What I'm calling the preallocation form is the secton of code under the following comment.
% compose + preallocation, vectorized
In that section, the string array is created by a single line of code using vectorized inputs. Using vectorized inputs is usually faster and more efficient than computing iteratively within a loop.
> Is repelem("",M,N) not adequate preallocation?
repelem is fine for preallocating an array but the strings method in my version is optimal for preallocating an empty string array. The more important part is how that array is used (loop vs vectorized inputs).
> Could [the difference] be a macOS issue?
It could be, altough I do not have access to a mac right now to verify those results.
I just repeated the measurments using R2025a version 25.1.0.2943329 which matches the version you mentioned in your question and the results showed the same conclusion as what I reported earlier (Windows OS). I also repeated it in R2024b Update 3 which resulted in very similar results. MATLAB in this forum uses linux which produced the results I created above.
isunix
ans = logical
1
ismac
ans = logical
0
Note that in the section "compose and two nested for loops" in your question, repelem is called within tic/toc but for the sprintf section above it, repelem is called outside of tic/toc which is the right choice since you're focusing on sprintf/compose performance, not compose + repelem.
On my Intel Mac system,
R2024b
R2025a
The compose() part is quite slow either way, but it is about 1.5 for R2024b and about 1.4 for R2025a.
Thanks @Walter Roberson. That confirms the effect of OS.
This was a really enlightening discussion. I'll share internally.
Great!
Since it is confirmed to be an issue possibly related to the OS, shall I close/delete the post? Or would you prefer to leave it up?
Because the post has at least one Answer, you would be unable to delete the post.
It is better to leave the post up, as other people might encounter the same situation.
Sounds good. Thanks, Walter!
I changed the title to include "macOS" and added a few more key words to help the algorithms.
I just found this thread and am quite interested in it. I have an Intel Mac Pro, so I figured it should perform fairly well, but it was even slower than both results posted above for other Mac configurations. The first figure (in light) is for R2024b while the second (in dark) is for R2025a. The two median values that include compose are 2.052 and 2.045 (essentially the same) for R2024b and 1.896 and 1.843 for R2025a.
I have always sensed that the Mac version of MATLAB did not receive all the optimization attention of the Windows version (especially in graphics, at least, before the New Desktop). But, I was not able to participate in a direct performance comparison across OS's and provide data for it before.
For clarity, even though both releases were simultaneously open, I ran the script in each version one at time. My computer has a 16-core 3.2 GHz processor and 160 GB of RAM (only 36 GB used during the tests).
This is rather frustrating, as my results are quite a bit slower than the other two sets of results and the expected benefit of preallocation (which works so well in so many contexts) simply does not work for Macs in this situation. I hope that this gets fixed soon and that the Mac version of MATLAB gets more attention. (Perhaps that is already the case for the Apple Silicon version since Intel Macs are no longer being sold...)
I landed here not because of OS, but also about performance of compose...what fascinates me is the bit where calling multiple compose()'s and +'ing them together is much faster than one shot compose.
In my application I have data with order hundred thousand rows from a db, and I need to construct a human-readable id from several fields.
Profiling showed my bottleneck to be the compose'ing (method 1 below). I played aroun and found that sprintf is much faster, but I couldn't get it to output a vector of strings (only can do 1 string with line breaks), hence landing on this post.
Now I find that I can be counterintuitively inefficient with calls to compose and will speed up my code (~26x faster on my computer with relevant number of rows, but showing only 15x faster on the web platform here).
Adding a route using "join" instead of "+" is similarly faster.
NReps = 10;
NData = 100000;
prfx1 = char(randi(26,[NData,1])+64);
prfx2 = char(randi(26,[NData,1])+64);
prfx3 = char(randi(26,[NData,1])+64);
sufx1 = randi(999,[NData,1]);
sufx2 = randi(999,[NData,1]);
tdurat = zeros(NReps,3);
% intuitive single call to compose
for k = 1:NReps
tic
CatCodeA = compose("%s%s%s%03d%03d",prfx1,prfx2,prfx3,sufx1,sufx2);
tdurat(k,1) = toc;
end
% unintuitive multiple composes, then +'d together
for k = 1:NReps
tic
CatCodeB = compose("%s",prfx1)+compose("%s",prfx2)+compose("%s",prfx3)+compose("%03d",sufx1)+compose("%03d",sufx2);
tdurat(k,2) = toc;
end
% multiple composes join()'d together
for k = 1:NReps
tic
CatCodeC = join([...
compose("%s",prfx1),...
compose("%s",prfx2),...
compose("%s",prfx3),...
compose("%03d",sufx1),...
compose("%03d",sufx2)],...
"" ...
);
tdurat(k,3) = toc;
end
% excerpt inputs
[cellstr(prfx1(1:10)),cellstr(prfx2(1:10)),cellstr(prfx3(1:10)),num2cell(sufx1(1:10)),num2cell(sufx2(1:10))]
ans = 10×5 cell array
{'M'} {'E'} {'H'} {[668]} {[843]} {'T'} {'U'} {'S'} {[179]} {[853]} {'H'} {'B'} {'T'} {[928]} {[538]} {'N'} {'P'} {'R'} {[410]} {[336]} {'X'} {'F'} {'O'} {[155]} {[310]} {'S'} {'P'} {'J'} {[341]} {[405]} {'H'} {'I'} {'I'} {[546]} {[838]} {'O'} {'O'} {'B'} {[708]} {[510]} {'Y'} {'M'} {'U'} {[476]} {[679]} {'E'} {'E'} {'T'} {[113]} {[426]}
% excerpt outputs
[CatCodeA(1:10),CatCodeB(1:10),CatCodeC(1:10)]
ans = 10×3 string array
"MEH668843" "MEH668843" "MEH668843" "TUS179853" "TUS179853" "TUS179853" "HBT928538" "HBT928538" "HBT928538" "NPR410336" "NPR410336" "NPR410336" "XFO155310" "XFO155310" "XFO155310" "SPJ341405" "SPJ341405" "SPJ341405" "HII546838" "HII546838" "HII546838" "OOB708510" "OOB708510" "OOB708510" "YMU476679" "YMU476679" "YMU476679" "EET113426" "EET113426" "EET113426"
% sanity checks
isequal(CatCodeA,CatCodeB)
ans = logical
1
isequal(CatCodeB,CatCodeC)
ans = logical
1
% timers
tA = mean(tdurat(:,1));
tB = mean(tdurat(:,2));
tC = mean(tdurat(:,3));
fprintf([ ...
'Single compose() takes: %g sec\n' ...
'compose()+compose() takes: %g sec (~%.2f x speedup)\n' ...
'join of composes() takes: %g sec (~%.2f x speedup)\n'], ...
tA,tB,tA/tB,tC,tA/tC);
Single compose() takes: 2.44884 sec compose()+compose() takes: 0.179393 sec (~13.65 x speedup) join of composes() takes: 0.197341 sec (~12.41 x speedup)
% boxchart
bxh = boxchart(tdurat);
xticklabels({'Single compose()','compose()+compose()','join of composes()'});
ylabel('duration (sec)')

Iniciar sesión para comentar.

My understanding is that compose is based on sprintf
That is not correct, at least not directly.
compose() selects elements "across" the optional data parameters; sprintf() selects elements "down" the optional data parameters.
sprintf('%d : %d', [1; 2], [3; 4])
ans = '1 : 23 : 4'
compose('%d : %d', [1; 2], [3; 4])
ans = 2×1 cell array
{'1 : 3'} {'2 : 4'}

3 comentarios

What I meant to say is that at the lowest level, compose uses sprintf, despite the two functions operating differently on arrays.
We do not know this to be true.
For example, for historical reasons, sprintf() might have been coded in C, but the newer compose() might be coded in C++.
A fair point.

Iniciar sesión para comentar.

Categorías

Más información sobre System Composer en Centro de ayuda y File Exchange.

Productos

Versión

R2025a

Preguntada:

el 17 de Jun. de 2025

Editada:

el 13 de Ag. de 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by