Main Content

Compute Binomial Coefficients Exactly

This example shows how to get precise values for binomial coefficients and find probabilities in coin-tossing experiments using the Symbolic Math Toolbox™.

Define the symbolic function, P(n,k), that computes the probability for the heads to come up exactly k times out of n tosses.

syms P(n,k)
P(n,k) = nchoosek(n,k)/2^n
P(n, k) = 

(nk)2n

Suppose you toss a coin 2000 times. The probability that heads comes up in half of the tosses is P(n, n/2), where n = 2000. The result is a rational expression with large numbers in both the numerator and denominator. Symbolic Math Toolbox returns the exact result.

n = 2000;
central = P(n, n/2)
central = 

320023691717107767864869141090753913186941388747093286534434787136210654094075586848270780341032844718978869737499691201939919030938206998752819929651447121650470765099112601116954925617580705043388419705530541025231105279169475241319583777521515740399490335296296986416422587436741374826165947451122678987232111208072306646349645444817081649045390224780439737117235599681944329390758068320042910611589088468568382563818033485652790542246479008106992991688671397929568442239221591755153140107136654000394609234559552383364790451240932289636703829891579702548735354741015983305611110777614681873617051793954211366022694113801876840128100034871409513586250746316776290259783425578615401030447369541046747571819748417910583511123376348523955353017744010395602173906080395504375010762174191250701116076984219741972574712741619474818186676828531882286780795390571221287481389759837587864244524002565968286448146002639202882164150037179450123657170327105882819203167448541028601906377066191895183769810676831353109303069033234715310287563158747705988305326397404720186258671215368588625611876280581509852855552819149745718992630449787803625851701801184123166018366180137512856918294030710215034138299203584

Approximate this rational number with 10-digit accuracy using digits and vpa.

previous_digits = digits(10);
vpa(central)
ans = 0.01783901115

Compute the probability that the number of "heads" differs from the expected value by no more than two standard deviations.

sigma = sqrt(n/4);
withinTwoSigma = symsum(P(n, k), k, ceil(n/2 - 2*sigma), floor(n/2 + 2*sigma))
withinTwoSigma = 

1368352466395056520289440683203474916723919590470093450966749985815252789703206921185078166194364368628241604751390276770010366566502905435584053306977090361373888807166934911669058359617267999440891473946928814790334474319243690058420836636313463906041227215461588805899687850446915983749011992904463517362389956795960668716644110185293844046419746984562010264700409766370663868950458143724613258018264237350867641134477342417401137040307238557519525647433371411711016304972400159139383774198454413097509455103606581317850616759661664811032314997464091016929165118720278926779247759469497371411534346514351633690928181552910415014721024800278971276108690005970534210322078267404628923208243578956328373980574557987343284668088987010788191642824141952083164817391248643164035000086097393530005608928615873757935780597701932955798545493414628255058294246363124569770299851118078700702913956192020527746291585168021113623057313200297435600989257362616847062553625339588328228815251016529535161470158485414650824874424552265877722482300505269981647906442611179237761490069369722948709004895010244652078822844422553197965751941043598302429006813614409472985328146929441100102855346352245681720273106393628672

Approximate the result with a floating-point number.

vpa(withinTwoSigma)
ans = 0.9534471795

Compare this result with the following two estimates derived from the cumulative distribution function (cdf) of the normal distribution. It turns out that taking the midpoint between the first integer outside and the first integer inside the two-sigma interval gives a more precise result than using the two-sigma interval itself.

syms cdf(x)
cdf(x) = 1/2 * (1 + erf((x - n/2)/sqrt(sym(n/2))))
cdf(x) = 

erf(10x-1000100)2+12

estimate1 = vpa(cdf(n/2 + 2*sigma) - cdf(n/2 - 2*sigma))
estimate1 = 0.9544997361
estimate2 = vpa(cdf(floor(n/2 + 2*sigma) + 1/2) - ...
                cdf(ceil(n/2 - 2*sigma) - 1/2))
estimate2 = 0.9534201342
error1 = vpa(estimate1 - withinTwoSigma)
error1 = 0.001052556595
error2 = vpa(estimate2 - withinTwoSigma)
error2 = -0.00002704525904

Restore the default precision for floating-point computations.

digits(previous_digits);

Plot this distribution for k within the 2σ-interval. The plot is a Gaussian curve.

k = n/2 + (-2*sigma:2*sigma);
plot(k, P(n,k), '-+');
title('2000 coin tosses');
xlabel('Number of heads');
ylabel('Probability');