Calculating mean and SD of histogram
Mostrar comentarios más antiguos
Hi All,
I have a histogram in CSV form consisting of 1000 bars and representing energy (GeV). The vertical scale is the number of particles in the energy range of a bar. The expected distibution is Gaussian and I need to determine the mean and SD? I tried using fitdist but it only accepts one vector and I should like to preserve both axes. How might I do this?
Regards
Tim
7 comentarios
Shae Morgan
el 31 de Jul. de 2020
do you have some sample code to share illustrating the problem?
the cyclist
el 31 de Jul. de 2020
It's much easier to help if you upload the data, rather than just describe the data.
But do you just have an "x" vector of bin centers, and "y" values of bin counts? Not the underlying data that the histogram was calculated using?
Computing the mean and std of bar-heights carries a different interpretation than computing the mean and std of the raw data.
The comment "I should like to preserve both axes" makes me wonder if you're trying to plot a distribution curve on top of your bar plot. Is that the case?
If so, and if the x-axis (energy range) is continuous, you should use a histogram instead of a bar plot. Then you can use histfit(data,nbins,'normal') to plot the histogram and the fitted normal distribution.
If the x-axis is not on a continuous scale, I'm not sure how to interpret a fitted distribution curve.
Tim Fulcher
el 2 de Ag. de 2020
Editada: Tim Fulcher
el 2 de Ag. de 2020
Image Analyst
el 2 de Ag. de 2020
Editada: Image Analyst
el 2 de Ag. de 2020
Again, like Adam asked, do you want the mean and stdev of the actual data (like you'd get with mean and std functions), or of the fitted analytical Gaussian equation?
Tim Fulcher
el 2 de Ag. de 2020
Respuestas (1)
Guessing that column 1 of the data are x-values to the bar plot and column 2 of the data are the bar heights, you can fit a guassian distribution to the (x,y) data with three parameters: mean (mu), standard deviation (sigma), and amplitude.
The solution below uses a bar plot but depending on what the x-values mean, a histogram may be more appropriate.
It produces a plot containing the original distribution, the fit, and the fit parameters in the title.
See inline comments for details.
% Read in the 1000x3 matrix
data = readmatrix('200_10.0_A-150_8mm_Steps_1-analysis_Primary_Energy.csv');
% Define the (x,y) values to fit
x = data(:,1); % must match x var in gausFcn
y = data(:,2); % must match y var in gausFcn
% Define the guassian function with 3 params: [mean, std, amplitude]
% The variables "x" and "y" MUST be defined with those names prior
% to these two lines!
gausFcn = @(p)p(3)*exp(-(((x-p(1)).^2)/(2*p(2).^2)));
gausFitFcn = @(p)gausFcn(p)-y; % function to fit
% Initial guesses
initGuess = [mean(data(:,1)),std(data(:,2)), max(data(:,2))];
% Fit the curve. See documentation for lsqnonlin for additonal constraints.
b = lsqnonlin(gausFitFcn, initGuess);
% Plot results.
clf()
h = bar(data(:,1),data(:,2));
yHat = gausFcn(b);
hold on
plot(x, yHat, 'r--', 'LineWidth', 2)
title(sprintf('\\mu=%.3f \\sigma=%.3f amp=%.3f', b))

Cautionary word #1: This assumes your x values are the centers of the bins and that assumption matters. If the x values are bin edges, then the estimate of the mean contains an error equal to or less than +/-0.00025 which is half of the bin widths ( diff(x(:,1))/2 ). If the x values are bin edges and x(1) is the left edge of the first bin, then half the bin width must be added to each x-value to approximate the bin centers.
Cautionary word #2: the fits of the distribution do not equal the fits to your raw data. Bin sizes affect bar heights and spread, so what you're fitting does not tell you the parameters of the underlying population of data.
The plot below shows 5 sets of histograms at 5 different bin-widths using the same raw data (dots).

Categorías
Más información sobre Histograms en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!