Am I computing cross entropy incorrectly?

3 visualizaciones (últimos 30 días)
Matthew Eicholtz
Matthew Eicholtz el 20 de Ag. de 2014
Comentada: Greg Heath el 15 de Oct. de 2017
I am working on a neural network and would like to use cross entropy as my error function. I noticed from a previous question that MATLAB added this functionality starting with R2013b. I decided to test the crossentropy function by running the simple example provided in the documentation. The code is reprinted below for convenience:
[x,t] = iris_dataset;
net = patternnet(10);
net = train(net,x,t);
y = net(x);
perf = crossentropy(net,t,y)
When I run this code, I get perf = 0.0367. To verify this result, I ran the code:
ce = -mean(sum(t.*log(y)+(1-t).*log(1-y)))
which resulted in ce = 0.1100. Why are perf and ce unequal? Do I have an error in my calculation?

Respuesta aceptada

Greg Heath
Greg Heath el 22 de Ag. de 2014
Editada: Greg Heath el 22 de Ag. de 2014
If c classes are mutually exclusive, classifier target probability values should be the certain probability values of 0 or 1 and must sum to 1.
If the corresponding classifier uses a softmax output transfer function, output estimates are bounded by the open range (0,1) and sum to 1.
If classes are not mutually exclusive (e.g., tall, dark ,handsome ), 0 or 1 classifier target probability values do not have to sum to 1.
If the corresponding classifier uses a logsig output transfer function, output estimates are bounded by the open range (0,1) but are not constrained to have a unit sum.
A useful performance function is the crossentropy between outputs and targets.
For mutually exclusive targets and a softmax output, the corresponding form for crossentropy is
Xent1 = -sum( t.*log(y))
For non-mutually exclusive targets and a logsig output, the corresponding form for crossentropy is
Xent2 = -sum( t.*log(y)) + (1-t).*log(1-y))
For your example I get
clear all, clc
[ x, t ] = iris_dataset;
[ O N ] = size(t) % [ 3 150 ]
minmax0 = repmat([0 1],3,1)
checkt1 = max(abs( minmax(t)- minmax0))%[0 0]
checkt2 = max(abs(sum(t)-ones(1,N))) % 0
net = patternnet(10);
rng(0)
[ net tr y ] = train(net,x,t);
checky1 = max(abs( minmax(y)- minmax0))
% checky1 = [ 2.4214e-4 1.8807e-3 ]
checky2 = max(abs(sum(y)-ones(1,N))) % 2.2204e-16
perf = crossentropy(net,t,y) % 0.033005
Xent1 = mean(-sum(t.*log(y))) % 0.049552
Xent3 = mean(-sum((1-t).*log(1-y))) % 0.049464
Xent2 = mean(-sum(t.*log(y)+ (1-t).*log(1-y))) % 0.099015
Unfortunately, none of the following gives a formula
help crossentropy
doc crossentropy
type crossentropy
and the example in the website documentation incorrectly uses Xent2 which is only valid for nonexclusive classes.
If you search on crossentropy in the comp.ai.neural-nets newsgroup, you should find many posts on the topic.
Bottom Line: Xent2 is the correct answer. However, your calculation of crossentropy and Xent3 are not too far from mine. If you use rng(0) they should match.
Hope this helps.
Thank you for formally accepting my answer
Greg
  2 comentarios
Greg Heath
Greg Heath el 23 de Ag. de 2014
Notice that
[O N ] = size(target)% [ 3 150 ]
and
1. 3*0.0367 = 0.1101
2. 3*0.03005 = 0.09015
Robert McKellar
Robert McKellar el 15 de Oct. de 2014
Hi Greg
Your measure of Xent2 (for non-mutually exclusive targets) should give exactly the same result as crossentropy(net,t,y), so should the code not be:
perf = crossentropy(net,t,y) % 0.033005042210726
Xent1 = -sum(sum(t.*log(y)))/numel(t) % 0.016517184907364
Xent3 = -sum(sum((1-t).*log(1-y)))/numel(t) % 0.016487857303362
Xent2 = -sum(sum(t.*log(y)+ (1-t).*log(1-y)))/numel(t) % 0.033005042210726
This way, perf and Xent2 agree.
Regards
Bob

Iniciar sesión para comentar.

Más respuestas (3)

Greg Heath
Greg Heath el 21 de Ag. de 2014
You are using the Xent form for outputs and targets that do not have to sum to 1. The corresponding output transfer function is logsig.
For targets that are constrained to sum to 1, use softmax and the first tern of the sum.
For extensive discussions search in comp.ai.neural-nets using
greg cross entropy
Hope this helps.
Thank you for formally accepting my answer
Greg
  2 comentarios
Matthew Eicholtz
Matthew Eicholtz el 21 de Ag. de 2014
Editada: Matthew Eicholtz el 21 de Ag. de 2014
Thanks for the reply. But this does not quite answer my question, so let me pose it another way. What line of code using y and t will produce 0.0367 similar to when I run the crossentropy fcn?
Greg Heath
Greg Heath el 21 de Ag. de 2014
You are welcome for the reply. It did answer your question.
The next time you check make sure that you initialize the RNG before you train so that you can duplicate your calculation.

Iniciar sesión para comentar.


Or Shamir
Or Shamir el 23 de Sept. de 2017
Following the explanation here, you do:
ce = -t .* log(y);
perf = sum(ce(:))/numel(ce);
  1 comentario
Greg Heath
Greg Heath el 26 de Sept. de 2017
isn't that the same as
perf = mean(ce(:)); % ?

Iniciar sesión para comentar.


Tian Li
Tian Li el 13 de Oct. de 2017
ce = -t .* log(y); perf = sum(ce(:))/numel(ce);
This is the right answer for muti-class classification error problem
  1 comentario
Greg Heath
Greg Heath el 15 de Oct. de 2017
Why do you think that is different from the last 2 answers???

Iniciar sesión para comentar.

Categorías

Más información sobre Sequence and Numeric Feature Data Workflows en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by