How to know the range automatically identified by readtable when importing an Excel file?

I am using readtable in an app to import data from an Excel spreadsheet, which contains several lines of heading and then the actual table with information: something like the example below
readtable successfully manages to skip the headers and to pull out the data. My problem however is that I also need to copy all the information in the header section because then I need to create a second file with the same header and only some of the elements of the table below. I cannot simply copy a given range (say A1:G6 in the example above) because the number of rows/columns of the header changes depending on the file.
Is there a way to know which range was automatically identified as a table by readtable, so that I could then deduct the range of the header that I need?

 Respuesta aceptada

It will be only as good as the ability of the detectImportOptions bot's ability to recognize the start of the data, but
opt=detectImportOptions(yourfile);
disp(opt.DataRange)
disp(opt.VariableNamesRange)
disp(opt.VariableNames)
will show you what it determined automagically...the two ranges are the starting cell (upper LH corner of the range); the number of elements in the VariableNames array will let you deduce the number of columns. It does not return the Excel UsedRange range address; that could be a useful adjunct to ask for as an enhancement; as is, all you can do is then read the table and see how many rows were returned. That size, of course, is dependent upon the setting of the 'ImportError' and 'Missing' rules and the data content of the file itself.
IF (the proverbial "big if") the format is always similar to the shown table, it's probably going to be pretty reliable; less well formatted or more eclectic data collections may confuse the scanning and not always get the same answer for what appear to be very similar worksheets. In particular, missing or nonnumeric data in columns that should be recognized as numeric often will cause the whole column to be treated as a cell or may change the 'DataRange' value returned.

7 comentarios

Very helpful, thanks a lot!
" In particular, missing or nonnumeric data in columns that should be recognized as numeric often will cause .."
The key in the above is whether those nonnumeric entries are in the first few rows of the data file; the 'bot bases its analysis on an unknown number of lines read/analyzed; a value well down in the spreadsheet will likely only be recognized/returned as missing value but not affect the identification process...
calling detectImportOptions() yourself causes it to scan more rows than if you do not call it yourself.
Yeah, but that's an undocumented number and for quite a time was subject to change from release to release..not sure how many it is nor when last changes occurred, but behavior used to change between releases fairly frequently although it was never made clear just what/how changes were made -- like Tide, it was only advertised as "new and improved". :)
I do like the idea of returning the Excel AreaUsed range as an enhancement to the user, although perhaps it should be a separate function a la sheetnames. I did add it to the excel_utils package locally...now I'm not recalling who actually posted it some time back -- @Image Analyst or @Star Strider, maybe?
The amount scanned by detectImportOptions was documented at one point... but not well documented, and seems to have changed since then...
dpb
dpb el 14 de Abr. de 2023
Editada: dpb el 14 de Abr. de 2023
I don't recall ever seeing it documented in the doc itself; I remember it being discussed here in the forum when behavior was changing fairly frequently and questions arose by poster's prior code not acting the same...but, I don't download/install every release and don't visit the release notes on other releases very often, either, so well could have missed what there might have been.
function usedRange=GetUsedRange(Excel, sheetNameOrNumber)
% Returns used range address string for input sheet.
% Can pass in the number of the sheet (1,2,3,etc.) or name ('Results').
try
usedRange=Excel.Worksheets.Item(sheetNameOrNumber).UsedRange; % get range reference
usedRange=usedRange.Address; % return the address string
catch ME
errorMessage = sprintf('Error in function GetUsedRange.\n\nError Message:\n%s', ME.message);
fprintf('%s\n', errorMessage);
end
return;
end
To use, first create the Excel actx engine with actxserver and open the file; these were designed to be used to do various things to a given workbook, opening the workbook only once, do the work, and then closing and cleaning up the object rather than as complete standalone, create a new instance every time.
>> excel = actxserver('Excel.Application');
>> d=dir('TestAnnualReportScholarshipData.xls');
>> fn=fullfile(d.folder,d.name);
>> Workbook = excel.Workbooks.Open(fn)
>> GetUsedRange(excel,'Endowed Funds')
>> ans =
'$A$1:$P$74'
>> excel.ActiveWorkbook.Close(false);
>> excel.Quit
>> clear excel

Iniciar sesión para comentar.

Más respuestas (1)

No. However you should first do a detectImportOptions on the file, and then pass the options to readtable() . The options object will indicate the range over which data was imported.

1 comentario

" The options object will indicate the range over which data was imported."
Not exactly the range, Walter. It will return the ULH corner of the data and you can deduce the number of columns, but it does not return a length indication; the 'UsedRange' from Excel could be a useful addition to the returned output; it's got to be used internally but isn't reported to the user.

Iniciar sesión para comentar.

Productos

Versión

R2023a

Preguntada:

el 13 de Abr. de 2023

Editada:

dpb
el 15 de Abr. de 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by