How should XPath be set in TableSelector for htmlImportOptions so readtable( ) can output the first three tables in an html file?
Mostrar comentarios más antiguos
I like to read first three tables in an html file with calling readtable( ) once in order to reduce the html file reading time. However, the readtable( ) function from the database toolbox seems to read only one table at a time. I have tried to manipulate TableSelector right by playing around with a few XPath scripts. They either return error message or only one table. For example, the one below returns the first table, but there is no table 2 or 3.
opts = detectHtmlImportOptions(htmlfile);
opts.TableSelector = "//TABLE[position()<4]";
readtable(htmlfile, opts)
I was wondering that because the output argument of readtable( ) is a table, it can only read only table at a time.
Another related question.
% why is not lowercase 'table' right?
opts.TableSelector = "//table[1]";
readtable(htmlfile, opts)
% ans=
% 0x1 empty table
% When TABLE[1] use upper case letters, readtable( ) output the first table correctly.
opts.TableSelector = "//TABLE[1]";
readtable(htmlfile, opts)
Respuesta aceptada
Más respuestas (1)
Christopher Creutzig
el 30 de Sept. de 2022
1 voto
readtable currently only returns a single table. There has been talk about a function returning multiple tables, but I don't know of any concrete plans. It may be worth letting support@mathworks.com know you are looking for something like that – given the time things can take from inception to release, it may not always be obvious, but user demand does influence priorities.
As for lowercase table selectors … table selectors are XPath expressions, and XPath is, well, case-sensitive. Most HTML versions/variants (maybe in practice all of them) are case agnostic, although their standards differ in what they regard as the “right” casing to use. htmlTree normalizes to uppercase. But that doesn't mean we could simply treat the XPath expression as case agnostic, as it can contain parts where case matters. I'm not sure if your question is simple curiosity or if this is actually a bump in the road to solving your problems. If the latter, please let us know more.
Nitpick: readtable does not require Database Toolbox, it is in core MATLAB.
1 comentario
Categorías
Más información sobre String Parsing en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!