Whats the difference between a table (new in R2013b) and a dataset (stats toolbox)?

4 visualizaciones (últimos 30 días)
As an enthusiast for the dataset class, I notice with interest a new class table in the latest MATLAB release (in the promo video). This sounds very similar to the existing dataset class in the Statistics Toolbox which I have been using since release.
When I search the documentation/help for "table dataset" all I find is a converter function dataset2table and table2dataset, but the question I have is what is the difference in intention between these? When is it appropriate to use a dataset and when to use a table? What is the difference between the design of these two classes?
What about the "new" categorical class. Has this moved from stats toolbox into base MATLAB?
Should we expect dataset and categorical classess in the Statistics Toolbox to be deprecated in the future?

Respuesta aceptada

Peter Perkins
Peter Perkins el 10 de Sept. de 2013
Julian, as you noticed, MATLAB R2013b includes two new array types known as tables and categorical arrays. These are very similar to the dataset, nominal, and ordinal array types that have been part of the Statistics Toolbox for about six years. Like a dataset array, a table is a container that holds mixed-type tabular data, the sort of column-oriented data you would often import from a CSV file or a spreadsheet. And like nominal and ordinal arrays, a categorical array represents discrete non-numeric data, the sort of data you might otherwise have used strings or "coded integers" to store.
Generally speaking, these new data types should look and feel very familiar to anyone who has used the ones in the Statistics Toolbox. One obvious difference is that they are included as part of core MATLAB, and you don't need to install the Statistics Toolbox to use them. In addition, their design and terminology makes them a bit more accessible for non-statistical uses, though they remain just as useful for statistics.
Tables and categorical arrays are ultimately intended as replacements for dataset, nominal, and ordinal arrays, and we recommend that MATLAB users adopt them for new work. We also recommend that, over time, users update any of their existing code that uses dataset/nominal/ordinal, but we don't expect that that changeover can happen immediately. Upcoming releases will provide more details and strategies for making the transition.
In R2013b, all of the Statistics Toolbox functionality that uses nominal and ordinal arrays also supports the new categorical arrays. In R2013b, you'll still need to use dataset arrays in the Statistics Toolbox for things like LinearModel and (new in R2013b) LinearMixedModel, but you might consider creating tables and converting to dataset only when needed, using table2dataset.
  5 comentarios
Julian
Julian el 12 de Sept. de 2013
Steve, sure, I agree not mentioning the existing classes in the main MATLAB doc is clearer for a new (or a no Stats Toolbox) MATLAB user, and the doc is cleaner that way. But release notes (and videos) speak mainly to existing users rather than new ones.... and the new converter methods dataset2table and table2dataset should be mentioned in the release notes for the Statistics Toolbox. TMW modified the head doc page for Dataset Arrays http://www.mathworks.com/help/stats/dataset-arrays.html to reference table2dataset & dataset2table, but there is no remark at all about the relation between dataset and table, and the future implication for Statistics Toolbox users. The head page for Categorical Arrays http://www.mathworks.co.uk/help/stats/categorical-arrays.html fails even to mention its new non-abstract namesake.
I am sure the design of table and categorical leaned heavily on experience with dataset and categorical. TMW didn't forget about datasets or categoricals when you launched their replacements with a big fanfare, but you did forget about their users when you updated the Statistics Toolbox documentation and release notes.
Thank you for your other remark regarding efficiency. BTW It's great to see "datasets" and "categoricals" get a wider audience, I really like them. It will be quite a while before I get to try the new ones (my company just upgraded to R2013a from R2011a). I hope a migration guide will be published by then?
Julian
Julian el 12 de Sept. de 2013
BTW, I just found out changes made to the dataset() constructor between R2011a and R2013a broke my code...

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Preprocessing Data en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by