Readtable with mixed variable types - 2021a version behaving differently than 2019a

13 visualizaciones (últimos 30 días)
Hello,
I'm using readtable to read an Excel file. The file I'm reading is not strictly organized by columns. For example, line 1 can have a string in the second column line 2 a number and line 3 could have an empty cell.
In 2019a this was no issue, I would simply get a table with empty strings or numbers read as strings, which I could easily convert.
In 2021a some columns are detected as numeric, and if the cell contains a string, it is simply read as "NaN". If I force the variable type to 'string', I still get empty cells (rather than empty strings), which breaks my subsequent code.
Which set of options can I pass to readtable so that
- cells containing strings are read as strings
- empty cells (in columns that are otherwise populated) are read as empty strings
- the number of columns read = maxmimum number of columns containing data in any row?
Thanks,
Martin
  4 comentarios
Martin Melcher
Martin Melcher el 20 de Ag. de 2021
Hello everyone,
I was not using any specific options in readtable, other than specifying the worksheet and setting "ReadVariableNames" to 0.
Thanke to the previous comment, I used readcell followed by cell2table. This worked with minimal adjustments - as empty cells were now reported as missing value, where they were previously read as empty strings.
Thanks for the good suggestion. I know there are probably much more foolproof and clean ways of coding, but I'm still surprised that backwards compatibility cannot be taken for granted after an upgrade.. I have never seen that in any other programming language.
Cheers,
Martin
dpb
dpb el 22 de Ag. de 2021
Editada: dpb el 23 de Ag. de 2021
Unlike Fortran or C/C++, etc., MATLAB is a proprietary product not bound by a Standards Committee so, while there is an attempt at maintaining backwards compatiability at a given level, it is not at all unusual for Mathworks to make changes in operational behavior of various functions -- particularly higher-level abstractions like readtable are regularly improved. As a relatively recent introduction, the enhanced scanning is most often of benefit in being able to more accurately assess and import irregular files at the cost of some more overhead that is occasionally noticeable. Unfortunately, "there is no free lunch!" and so once in a while a revision such as this can cause a hiccup in previous code as you've noticed here.
In general, it's probably more reliable to spend a little more time with the import options in such a case and rely less on the default processing--which is, again, somewhat of a conundrum in that the whole point is to make the function more of an "easy-to-use, no intervention" tool. Sometimes it succeeds, ocasionally, it ends up going the other way. There is no perfect solution other than status quo which also isn't a viable development model.
TMW is pretty good about documenting changes; this one occurred in R2020a
readtable Function: Uses results of detectImportOptions function by default
Starting in R2020a, the readtable function uses the results of the detectImportOptions
function to import tabular data. In essence, these two readtable function calls behave
identically.
T = readtable(filename)
T = readtable(filename,detectImportOptions(filename))
Compatibility Considerations
There are several differences between the default behavior of readtable and its default
behavior in previous releases. To call readtable with the default behavior it had up to
R2019b, use the 'Format','auto' name-value pair argument.
T = readtable(filename,'Format','auto')
...
The whole skinny is at <release-notes-link> although have to navigate to the R202a section and then the Data Import subsection.
Of course, if one doesn't update every release, there's a lot to go through every six months to have any hope of staying abreast...one of the disadvantages of such an active development cycle as compared to the advantage of new features and bug fixes...it's a tradeoff everybody has to make for themselves.
For mission-critical code, it is really a conundrum...one almost has to redo the whole validation exercise on each release which may be a very expensive and time-consuming effort.

Iniciar sesión para comentar.

Respuestas (0)

Categorías

Más información sobre Logical en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by