Matlab unable to parse a Numeric field when I use the gather function on a tall array.

Question

Ninad el 21 de Ag. de 2025

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/2179528-matlab-unable-to-parse-a-numeric-field-when-i-use-the-gather-function-on-a-tall-array

Comentada: Jeremy Hughes el 25 de Sept. de 2025

So I have a CSV file with a large amount of datapoints that I want to perform a particular algorithm on. So I created a tall array from the file and wanted to import a small chunk of the data at a time. However, when I tried to use gather to get the small chunk into the memory, I get the following error.

"Board_Ai0" is the header of the CSV file. It is not in present in row 15355 as can be seen below where I opened the csv file in MATLAB's import tool.

The same algorithm works perfectly fine when I don't use tall array but instead import the whole file into the memory. However, I have other larger CSV files that I also want to analyze but won't fit in memory.

UPDATE: So apparently the images were illegible but someone else edited the question to make the size of the image larger so I guess it should be fine now. Also I can't attach the data files to this question because the data files that give me this problems are all larger than 5 GB.

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Ninad el 6 de Sept. de 2025

Abrir en MATLAB Online

So the code works well when I run it on a file that can fit in memory. But when I run it on a file that cannot, I get the following error:

The code is:

function [data,startrow,done] = readdata(filename,startrow)
    nRows = 10000000;
    if isempty(startrow)
        startrow = 2;
    end
    opts = detectImportOptions(filename);
    opts.DataLines = [startrow, startrow+nRows-1];
    data = readtimetable(filename, opts);
    data = rmmissing(data);
    done = height(data) < nRows;
    startrow = startrow + nRows;
end
function [data,startrow,done]=givetimetable(~,~)
    data=timetable(seconds(200.000005),[0.139389038085938],'VariableNames',["Board0_Ai0"]);
    startrow=2;
    done=true;
end
ds = fileDatastore("1kcross.csv", "ReadFcn", @readdata, "UniformRead", true,"PreviewFcn",@givetimetable,"ReadMode","partialfile");
data=tall(ds);
slice=data(1:10000000,:);
slice=gather(slice);

What am I still doing wrong?

Jeremy Hughes el 25 de Sept. de 2025

Abrir en MATLAB Online

FYI, a TALL array is meant to allow you to operate on the entire table, even if it doesn't fit into memory. If you want to work on chunks of the file, don't use TALL.

Using datastore directly will let you read chunks of data.

ds = tabularTextDatastore(files,....)
    
while hasdata(ds)
    T = read(ds)
    
    % Do stuff.
end

However, that's not going to solve the problem because you have rows that don't contain numeric data. tabularTextDatastore doesn't allow for that.

I like @Harald's solution--but with some modification. I'd avoid calling detectImportOptions every iteration. For a datastore to work, the schema should be the same each time.

function [data,startrow,done] = readdata(filename,startrow)
    persistent opts
    if isempty(opts)
        opts = detectImportOptions(filename);
    end
    
    nRows = 10000000;
    if isempty(startrow)
        startrow = 2;
    end
    
    opts.DataLines = [startrow, startrow+nRows-1];
    data = readtimetable(filename, opts);
    data = rmmissing(data);
    done = height(data) < nRows;
    startrow = startrow + nRows;
end

There is still a problem with this; using opts.DataLines to manage the chunks still forces you to read lines up to the startRow in order to know where to start. That will mean each subsequent read will be slower.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Daniele Sportillo el 12 de Sept. de 2025

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2179528-matlab-unable-to-parse-a-numeric-field-when-i-use-the-gather-function-on-a-tall-array#answer_1570259

Abrir en MATLAB Online

Hi @Ninad, thanks for sharing the file. I see that your .csv includes the Variable Names in some data rows.

To handle this, you can use the TreatAsMissing property with tabularTextDatastore to treat those rows as NaN

data = tabularTextDatastore("1kwogndrd1.csv",TreatAsMissing={'Time','Board0_Ai0'},SelectedVariableNames={'Board0_Ai0'});

Then you can gather slices of the data without errors:

ds = tall(data);
slice = ds.Board0_Ai0(1:200000);
slice = gather(slice);

If you want to calculate the mean on the entire column, I recommend computing it first and then gathering the result. Use "omitnan" to exclude the NaN rows introduced by TreatAsMissing

m = mean(ds.Board0_Ai0,"omitnan");
gather(m)

Hope this helps!

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Jeremy Hughes el 25 de Sept. de 2025

Abrir en MATLAB Online

I"d just use the datastore directly instead of using TALL. Assuming you want the mean of each chunk.

ds = tabularTextDatastore("1kwogndrd1.csv",TreatAsMissing={'Time','Board0_Ai0'},SelectedVariableNames={'Board0_Ai0'});
M = {};
while hasdata(ds)
    data = read(ds);
    M{end+1} = mean(data)
end

If you want the mean of the entire variable, TALL doesn't need to be chunked.

ds = tabularTextDatastore("1kwogndrd1.csv",TreatAsMissing={'Time','Board0_Ai0'},SelectedVariableNames={'Board0_Ai0'});
data = tall(ds)
M = mean(data)
M = gather(M)

Iniciar sesión para comentar.

Answer 2

dpb el 6 de Sept. de 2025

2
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2179528-matlab-unable-to-parse-a-numeric-field-when-i-use-the-gather-function-on-a-tall-array#answer_1570115

Editada: dpb el 6 de Sept. de 2025

Abrir en MATLAB Online

It appears it is detectImportOptions that is having the problem -- apparently it tries to read the whole file into memory first before it does its forensics.

I don't think you need an import options object anyway, use the 'Range' named parameter in the argument to readtimetable

Something like

function [data,startrow,done] = readdata(filename,startrow)
    nRows = 10000000;
    if isempty(startrow)
        startrow = 2;  % this looks unlikely to be right from the earlier image there are 3(?) header rows?
    end
    range=sprintf('%d:%d',startrow, startrow+nRows);    % build row range expression
    data = readtimetable(filename, 'Range',range);
    data = rmmissing(data);
    done = height(data) < nRows;
    startrow = startrow + nRows;
end

This may still have some issues using the timetable, however if it first reads variable names from a header line which header line isn't there in the subsequent sections of the file. I don't know what trouble you'll run into with such large files if try to read 100K lines into the file but tell it to also read the variablenames from the second or third line in the file....probably ignoring variable names and letting MATLAB use defaults then set the Properties.VariableNames after reading of just accept the defaults would be best bet.

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Harald el 9 de Sept. de 2025

@Ninad, sorry that my suggestion did not work and for the troubles around this. I would usually test my suggestions but this is difficult due to not having the data.

@dpb, while I work at MathWorks, I am not a developer or in Technical Support. I try to support Answers as my core duties permit.

dpb el 9 de Sept. de 2025

Editada: dpb el 9 de Sept. de 2025

@Harald, no problem, just commenting on why I hadn't poked harder, earlier...

If @Ninad would attach a short section of a file it would make it simpler, indeed. It's not convenient at the moment to stop a debugging session and try to create a local copy of a similar file to play with/poke at.

The documentation isn't all that helpful, the only examples I can find using tables/timetables with tall arrays are tiny data files and don't use the filedatastore so they don't have a callback function with a table. I don't believe there is an example of the combination....

Iniciar sesión para comentar.

Answer 3

Stephen23 el 8 de Sept. de 2025

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/2179528-matlab-unable-to-parse-a-numeric-field-when-i-use-the-gather-function-on-a-tall-array#answer_1570149

Editada: dpb el 8 de Sept. de 2025

Providing the RANGE argument does not prevent READTABLE from calling its automatic format detection:

https://www.mathworks.com/help/matlab/import_export/control-how-matlab-imports-your-data.html

which might involve loading all or a significant part of the file into memory. The documented solution is to provide an import options object yourself (e.g. you can generate this on a known good file of a smaller size and then storing it) or alternatively using a low-level file reading command, e.g. FSCANF, FREAD, etc.

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

dpb el 11 de Sept. de 2025

I was suggesting to attach a piece of the file (perhaps zipped to include a little more). That would be enough for folks to have enough to test with that duplicates the actual format.

What, precisely, does "MATLAB crashed" mean? Actually aborted MATLAB itself or another out-of-memory or ...?

Ninad el 12 de Sept. de 2025

MATLAB crashed means the MATLAB window closed mid-run. Then, a Mathworks Crash Reporter window opened asking me to send a crash report to Mathworks.

Iniciar sesión para comentar.

Matlab unable to parse a Numeric field when I use the gather function on a tall array.

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (2)

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Matlab unable to parse a Numeric field when I use the gather function on a tall array.

12 comentarios Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (2)

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos