Calculating mean and SD of histogram

Hi All,
I have a histogram in CSV form consisting of 1000 bars and representing energy (GeV). The vertical scale is the number of particles in the energy range of a bar. The expected distibution is Gaussian and I need to determine the mean and SD? I tried using fitdist but it only accepts one vector and I should like to preserve both axes. How might I do this?
Regards
Tim

7 comentarios

Shae Morgan
Shae Morgan el 31 de Jul. de 2020
do you have some sample code to share illustrating the problem?
the cyclist
the cyclist el 31 de Jul. de 2020
It's much easier to help if you upload the data, rather than just describe the data.
But do you just have an "x" vector of bin centers, and "y" values of bin counts? Not the underlying data that the histogram was calculated using?
Adam Danz
Adam Danz el 31 de Jul. de 2020
Editada: Adam Danz el 2 de Ag. de 2020
Computing the mean and std of bar-heights carries a different interpretation than computing the mean and std of the raw data.
The comment "I should like to preserve both axes" makes me wonder if you're trying to plot a distribution curve on top of your bar plot. Is that the case?
If so, and if the x-axis (energy range) is continuous, you should use a histogram instead of a bar plot. Then you can use histfit(data,nbins,'normal') to plot the histogram and the fitted normal distribution.
If the x-axis is not on a continuous scale, I'm not sure how to interpret a fitted distribution curve.
Tim Fulcher
Tim Fulcher el 2 de Ag. de 2020
Editada: Tim Fulcher el 2 de Ag. de 2020
Hi guys, thanks for replies.
Data file attached. Third column is the error.
Re: preserving both axes, what I mean is that I'd like the mean and SD to be returned as MeV.
What I'm trying to work out is that if I set values below x MeV to zero (not to truncate) what will happen to the mean? If I had the raw data it would be a piece of cake but the data attached is extracted from a root file and I'm not sure I can get to the raw data.
Image Analyst
Image Analyst el 2 de Ag. de 2020
Editada: Image Analyst el 2 de Ag. de 2020
Again, like Adam asked, do you want the mean and stdev of the actual data (like you'd get with mean and std functions), or of the fitted analytical Gaussian equation?
Tim Fulcher
Tim Fulcher el 2 de Ag. de 2020
Hi Image Analyst,
in this case the fitted Gaussian.
Thanks and regards
Tim
Adam Danz
Adam Danz el 2 de Ag. de 2020
Editada: Adam Danz el 5 de Ag. de 2020
The csv file contains 3 columns of 100 numeric values.
What are we supposed to do with that? What's MeV?
When you say the 3rd col is error, what does that mean?

Iniciar sesión para comentar.

Respuestas (1)

Adam Danz
Adam Danz el 2 de Ag. de 2020
Editada: Adam Danz el 3 de Ag. de 2020
Guessing that column 1 of the data are x-values to the bar plot and column 2 of the data are the bar heights, you can fit a guassian distribution to the (x,y) data with three parameters: mean (mu), standard deviation (sigma), and amplitude.
The solution below uses a bar plot but depending on what the x-values mean, a histogram may be more appropriate.
It produces a plot containing the original distribution, the fit, and the fit parameters in the title.
See inline comments for details.
% Read in the 1000x3 matrix
data = readmatrix('200_10.0_A-150_8mm_Steps_1-analysis_Primary_Energy.csv');
% Define the (x,y) values to fit
x = data(:,1); % must match x var in gausFcn
y = data(:,2); % must match y var in gausFcn
% Define the guassian function with 3 params: [mean, std, amplitude]
% The variables "x" and "y" MUST be defined with those names prior
% to these two lines!
gausFcn = @(p)p(3)*exp(-(((x-p(1)).^2)/(2*p(2).^2)));
gausFitFcn = @(p)gausFcn(p)-y; % function to fit
% Initial guesses
initGuess = [mean(data(:,1)),std(data(:,2)), max(data(:,2))];
% Fit the curve. See documentation for lsqnonlin for additonal constraints.
b = lsqnonlin(gausFitFcn, initGuess);
% Plot results.
clf()
h = bar(data(:,1),data(:,2));
yHat = gausFcn(b);
hold on
plot(x, yHat, 'r--', 'LineWidth', 2)
title(sprintf('\\mu=%.3f \\sigma=%.3f amp=%.3f', b))
Cautionary word #1: This assumes your x values are the centers of the bins and that assumption matters. If the x values are bin edges, then the estimate of the mean contains an error equal to or less than +/-0.00025 which is half of the bin widths ( diff(x(:,1))/2 ). If the x values are bin edges and x(1) is the left edge of the first bin, then half the bin width must be added to each x-value to approximate the bin centers.
Cautionary word #2: the fits of the distribution do not equal the fits to your raw data. Bin sizes affect bar heights and spread, so what you're fitting does not tell you the parameters of the underlying population of data.
The plot below shows 5 sets of histograms at 5 different bin-widths using the same raw data (dots).

Preguntada:

el 31 de Jul. de 2020

Editada:

el 5 de Ag. de 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by