sprintf vs. compose performance for large arrays on macOS

Question

1 voto

I would expect sprintf and compose to exhibit similar performance, yet on my Mac, sprintf is considerably faster. Consider the following example.

M = 32;
N = 32;
A = randn(M,N);
[m,n] = ndgrid(1:M,1:N);
Nrep = 100;
%% compose and no for loops
t1 = zeros(Nrep,1);
for kr=1:Nrep
    tic;
    s1 = compose("%d,%d,%.3g",m(:),n(:),A(:));
    s1 = reshape(s1,M,N);
    t1(kr) = toc;
end
%% sprintf and two nested for loops
t2 = zeros(Nrep,1);
for kr=1:Nrep
    s2 = repelem("",M,N);
    tic;
    for j=1:N
        for i=1:M
            s2(i,j) = sprintf("%d,%d,%.3g",i,j,A(i,j));
        end
    end
    t2(kr) = toc;
end
%% compose and two nested for loops
t3 = zeros(Nrep,1);
for kr=1:Nrep
    tic;
    s3 = repelem("",M,N);
    for j=1:N
        for i=1:M
            s3(i,j) = compose("%d,%d,%.3g",i,j,A(i,j));
        end
    end
    t3(kr) = toc;
end
fprintf("                   min      |  mean     |  max\n");
fprintf("compose, no loops: %.6f |  %.6f  |  %.6f\n",min(t1),mean(t1),max(t1));
fprintf("sprintf,  2 loops: %.6f |  %.6f  |  %.6f\n",min(t2),mean(t2),max(t2));
fprintf("compose,  2 loops: %.6f |  %.6f  |  %.6f\n",min(t3),mean(t3),max(t3));

This code produces the following output on my machine.

                   min      |  mean      |  max
compose, no loops: 1.008151 |  1.035783  |  1.076671
sprintf,  2 loops: 0.004265 |  0.004384  |  0.008990
compose,  2 loops: 1.018900 |  1.050923  |  1.464713

It seems that sprintf is 60x - 70x faster in this example. Any idea why that is?

Here is the output of ver on my machine:

-----------------------------------------------------------------------------------------
MATLAB Version: 25.1.0.2943329 (R2025a)
MATLAB License Number: 
Operating System: macOS  Version: 15.5 Build: 24F74 
Java Version: Java 11.0.27+6-LTS with Amazon.com Inc. OpenJDK 64-Bit Server VM mixed mode
-----------------------------------------------------------------------------------------

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Adam Danz el 17 de Jun. de 2025

Editada: Adam Danz el 17 de Jun. de 2025

Abrir en MATLAB Online

1 voto

The execution below shows very different results than what's reported in the question. Note that I did some refactoring as to not overwrite the n and m variables.

Considerations:

A single tic/toc measurments is usually not useful. There is considerable variation in execution time, especially at these sub-second durations. The more responsible way to time this is to include many repetitions and then to compare the distributions to see if they significantly differ.
Note how much faster #2 below is compared to #1, both of which use compose. Reshaping is not necessary if you preallocate the output variable.
#5 is the fastest

M = 32;
N = 32;
A = randn(M,N);
[m,n] = ndgrid(1:M,1:N);
%% #1  compose & reshape
tic;
s1 = compose("%d,%d,%.3g",m(:),n(:),A(:));
s1 = reshape(s1,M,N);
toc;
Elapsed time is 0.016196 seconds.
%% #2  compose using preallocation
tic;
s1b = strings(size(A));
s1b(:) = compose("%d,%d,%.3g",m(:),n(:),A(:));
toc
Elapsed time is 0.013167 seconds.
isequal(s1b, s1)
ans = logical
   1
%% #3  sprintf and two nested for loops
s2 = strings(size(A));
tic;
for i=1:N
    for j=1:N
        s2(j,i) = sprintf("%d,%d,%.3g",j,i,A(j,i));
    end
end
toc
Elapsed time is 0.013522 seconds.
isequal(s2, s1)
ans = logical
   1
%% #4  compose and two nested for loops
s3 = strings(size(A));
tic;
for i=1:N
    for j=1:N
        s3(j,i) = compose("%d,%d,%.3g",j,i,A(j,i));
    end
end
toc
Elapsed time is 0.028691 seconds.
isequal(s3, s1)
ans = logical
   1
%% #5  String construction
tic
s4 = string(m) + "," + string(n) + "," + compose('%.3g',A);
toc
Elapsed time is 0.008618 seconds.
isequal(s4, s1)
ans = logical
   1
version()
ans = '25.1.0.2952844 (R2025a)'

Proper measurement with repetition

I'll compare methods #2 and #5 above using a more robust measure of execution time. Each method is timed 1000x, storing the duration on each iteration. The two distributions are plotted to evaluate whether they are clearly distinct distribtions and to evaluate the variance.

clear % remove existing variables created above

rng default

N = 32;

A = randn(N);

[m,n] = ndgrid(1:N);

% Preallocate outputs before timing

repetitions = 1000;

duration = nan(repetitions, 2);

% method 2

for i = 1:repetitions

t0 = tic;

s1b = strings(size(A));

s1b(:) = compose("%d,%d,%.3g",m(:),n(:),A(:));

duration(i,1) = toc(t0);

end

% method 5

for i = 1:repetitions

t0 = tic;

s4 = string(m) + "," + string(n) + "," + compose('%.3g',A);

duration(i,2) = toc(t0);

end

% Sanity check

isequal(s1b, s4)

ans = logical

1

% Show distributions as boxcharts (alternative:histograms)

bxh = boxchart(duration);

xticklabels({'compose','string+string'});

ylabel('duration (sec)')

title(repetitions+" repetitions")

Clearly the string+string method is significantly more performant and less variant than compose.

meanDuration = mean(duration);
fprintf('%g times faster\n', max(meanDuration)/min(meanDuration))
9.38987 times faster

16 comentarios
Mostrar 14 comentarios más antiguos Ocultar 14 comentarios más antiguos

Adam Danz el 17 de Jun. de 2025

Editada: Adam Danz el 17 de Jun. de 2025

Abrir en MATLAB Online

Here's a version where compose is not slower than sprintf. It uses preallocation which allows you to vectorize the inputs and avoids the loops.

The compose function is not based on sprintf. From the compose documentation page: "The compose function can return multiple pieces of formatted text as a string array or a cell array of character vectors, unlike sprintf. The sprintf function returns only a string scalar or a character vector." Thus, compose can often avoid for-loops in place of vectorized inputs.

By removing the loops and using the vectorized form, you reduce overhead and allow compose to run optimally.

I'd have to get into implementation details to offer a deeper comparison but the main takeaway is that these are separate functions with separate implementations and purposes.

To speed up your program while using compose, use the preallocated form. To improve performance futher, consider using method #5 from my answer.

M = 32;

N = 32;

A = randn(M,N);

[m,n] = ndgrid(1:M,1:N);

Nrep = 100;

t = nan(Nrep,3);

% sprintf in 2 loops

s2 = strings(size(A));

for kr=1:Nrep

t0 = tic;

for j=1:N

for i=1:M

s2(i,j) = sprintf("%d,%d,%.3g",i,j,A(i,j));

end

t(kr,1) = toc(t0);

end

% compose in 2 loops

s3 = strings(size(A));

for kr=1:Nrep

t0 = tic;

for j=1:N

for i=1:M

s3(i,j) = compose("%d,%d,%.3g",i,j,A(i,j));

end

t(kr,2) = toc(t0);

end

isequal(s2,s3)

ans = logical

1

% compose + preallocation, vectorized

s2b = strings(size(A));

for kr=1:Nrep

t0 = tic;

s2b(:) = compose("%d,%d,%.3g",m(:),n(:),A(:));

t(kr,3) = toc(t0);

end

isequal(s2, s2b)

ans = logical

1

boxchart(t)

xticklabels({'sprintf','compose','compose + preallocation'});

grid on

Adam Danz el 18 de Jun. de 2025

Editada: Adam Danz el 18 de Jun. de 2025

Abrir en MATLAB Online

What I'm calling the preallocation form is the secton of code under the following comment.

% compose + preallocation, vectorized

In that section, the string array is created by a single line of code using vectorized inputs. Using vectorized inputs is usually faster and more efficient than computing iteratively within a loop.

> Is repelem("",M,N) not adequate preallocation?

repelem is fine for preallocating an array but the strings method in my version is optimal for preallocating an empty string array. The more important part is how that array is used (loop vs vectorized inputs).

> Could [the difference] be a macOS issue?

It could be, altough I do not have access to a mac right now to verify those results.

I just repeated the measurments using R2025a version 25.1.0.2943329 which matches the version you mentioned in your question and the results showed the same conclusion as what I reported earlier (Windows OS). I also repeated it in R2024b Update 3 which resulted in very similar results. MATLAB in this forum uses linux which produced the results I created above.

isunix
ans = logical
   1
ismac
ans = logical
   0

Note that in the section "compose and two nested for loops" in your question, repelem is called within tic/toc but for the sprintf section above it, repelem is called outside of tic/toc which is the right choice since you're focusing on sprintf/compose performance, not compose + repelem.

goc3 el 13 de Ag. de 2025

I just found this thread and am quite interested in it. I have an Intel Mac Pro, so I figured it should perform fairly well, but it was even slower than both results posted above for other Mac configurations. The first figure (in light) is for R2024b while the second (in dark) is for R2025a. The two median values that include compose are 2.052 and 2.045 (essentially the same) for R2024b and 1.896 and 1.843 for R2025a.

I have always sensed that the Mac version of MATLAB did not receive all the optimization attention of the Windows version (especially in graphics, at least, before the New Desktop). But, I was not able to participate in a direct performance comparison across OS's and provide data for it before.

For clarity, even though both releases were simultaneously open, I ran the script in each version one at time. My computer has a 16-core 3.2 GHz processor and 160 GB of RAM (only 36 GB used during the tests).

This is rather frustrating, as my results are quite a bit slower than the other two sets of results and the expected benefit of preallocation (which works so well in so many contexts) simply does not work for Macs in this situation. I hope that this gets fixed soon and that the Mac version of MATLAB gets more attention. (Perhaps that is already the case for the Apple Silicon version since Intel Macs are no longer being sold...)

J. Alex Lee el 13 de Ag. de 2025

Editada: J. Alex Lee el 13 de Ag. de 2025

Abrir en MATLAB Online

I landed here not because of OS, but also about performance of compose...what fascinates me is the bit where calling multiple compose()'s and +'ing them together is much faster than one shot compose.

In my application I have data with order hundred thousand rows from a db, and I need to construct a human-readable id from several fields.

Profiling showed my bottleneck to be the compose'ing (method 1 below). I played aroun and found that sprintf is much faster, but I couldn't get it to output a vector of strings (only can do 1 string with line breaks), hence landing on this post.

Now I find that I can be counterintuitively inefficient with calls to compose and will speed up my code (~26x faster on my computer with relevant number of rows, but showing only 15x faster on the web platform here).

Adding a route using "join" instead of "+" is similarly faster.

NReps = 10;

NData = 100000;

prfx1 = char(randi(26,[NData,1])+64);

prfx2 = char(randi(26,[NData,1])+64);

prfx3 = char(randi(26,[NData,1])+64);

sufx1 = randi(999,[NData,1]);

sufx2 = randi(999,[NData,1]);

tdurat = zeros(NReps,3);

% intuitive single call to compose

for k = 1:NReps

tic

CatCodeA = compose("%s%s%s%03d%03d",prfx1,prfx2,prfx3,sufx1,sufx2);

tdurat(k,1) = toc;

end

% unintuitive multiple composes, then +'d together

for k = 1:NReps

tic

CatCodeB = compose("%s",prfx1)+compose("%s",prfx2)+compose("%s",prfx3)+compose("%03d",sufx1)+compose("%03d",sufx2);

tdurat(k,2) = toc;

end

% multiple composes join()'d together

for k = 1:NReps

tic

CatCodeC = join([...

compose("%s",prfx1),...

compose("%s",prfx2),...

compose("%s",prfx3),...

compose("%03d",sufx1),...

compose("%03d",sufx2)],...

"" ...

);

tdurat(k,3) = toc;

end

% excerpt inputs

[cellstr(prfx1(1:10)),cellstr(prfx2(1:10)),cellstr(prfx3(1:10)),num2cell(sufx1(1:10)),num2cell(sufx2(1:10))]

ans = 10×5 cell array

{'M'} {'E'} {'H'} {[668]} {[843]} {'T'} {'U'} {'S'} {[179]} {[853]} {'H'} {'B'} {'T'} {[928]} {[538]} {'N'} {'P'} {'R'} {[410]} {[336]} {'X'} {'F'} {'O'} {[155]} {[310]} {'S'} {'P'} {'J'} {[341]} {[405]} {'H'} {'I'} {'I'} {[546]} {[838]} {'O'} {'O'} {'B'} {[708]} {[510]} {'Y'} {'M'} {'U'} {[476]} {[679]} {'E'} {'E'} {'T'} {[113]} {[426]}

% excerpt outputs

[CatCodeA(1:10),CatCodeB(1:10),CatCodeC(1:10)]

ans = 10×3 string array

"MEH668843" "MEH668843" "MEH668843" "TUS179853" "TUS179853" "TUS179853" "HBT928538" "HBT928538" "HBT928538" "NPR410336" "NPR410336" "NPR410336" "XFO155310" "XFO155310" "XFO155310" "SPJ341405" "SPJ341405" "SPJ341405" "HII546838" "HII546838" "HII546838" "OOB708510" "OOB708510" "OOB708510" "YMU476679" "YMU476679" "YMU476679" "EET113426" "EET113426" "EET113426"

% sanity checks

isequal(CatCodeA,CatCodeB)

ans = logical

1

isequal(CatCodeB,CatCodeC)

ans = logical

1

% timers

tA = mean(tdurat(:,1));

tB = mean(tdurat(:,2));

tC = mean(tdurat(:,3));

fprintf([ ...

'Single compose() takes: %g sec\n' ...

'compose()+compose() takes: %g sec (~%.2f x speedup)\n' ...

'join of composes() takes: %g sec (~%.2f x speedup)\n'], ...

tA,tB,tA/tB,tC,tA/tC);

Single compose() takes: 2.44884 sec compose()+compose() takes: 0.179393 sec (~13.65 x speedup) join of composes() takes: 0.197341 sec (~12.41 x speedup)

% boxchart

bxh = boxchart(tdurat);

xticklabels({'Single compose()','compose()+compose()','join of composes()'});

ylabel('duration (sec)')

Iniciar sesión para comentar.

Answer 2

Walter Roberson el 17 de Jun. de 2025

Abrir en MATLAB Online

1 voto

My understanding is that compose is based on sprintf

That is not correct, at least not directly.

compose() selects elements "across" the optional data parameters; sprintf() selects elements "down" the optional data parameters.

sprintf('%d : %d', [1; 2], [3; 4])
ans = '1 : 23 : 4'
compose('%d : %d', [1; 2], [3; 4])
ans = 2×1 cell array
    {'1 : 3'}
    {'2 : 4'}

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Walter Roberson el 17 de Jun. de 2025

We do not know this to be true.

For example, for historical reasons, sprintf() might have been coded in C, but the newer compose() might be coded in C++.

Jose el 18 de Jun. de 2025

A fair point.

Iniciar sesión para comentar.

sprintf vs. compose performance for large arrays on macOS

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuestas (2)

16 comentarios
Mostrar 14 comentarios más antiguos Ocultar 14 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Categorías

Productos

Versión

Etiquetas

Community Treasure Hunt

sprintf vs. compose performance for large arrays on macOS

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuestas (2)

16 comentarios Mostrar 14 comentarios más antiguos Ocultar 14 comentarios más antiguos

3 comentarios Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Categorías

Productos

Versión

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

16 comentarios
Mostrar 14 comentarios más antiguos Ocultar 14 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo