efficient ways to read and write complex valued data

Question

Chris Gelrlich el 29 de Jun. de 2020

0
Enlazar

Enlace directo a esta pregunta

https://la.mathworks.com/matlabcentral/answers/556855-efficient-ways-to-read-and-write-complex-valued-data

Comentada: Chris Gelrlich el 10 de Jul. de 2020

I have large files that contain interleaved complex data. I reach chunks of it at a time for processing. It seams that there is no way to read this efficiently. I can't find out how to do it without a copy that shouldn't really be necessary. Here are some things that I tried (y.fid points to a file opened with fopen(), y.type is a valid type):

% method 1: read in to a vector, interleave with complex
tic
cData = fread(y.fid,numSamp.*2, y.type);
t1 = toc;
tic
cData = complex(cData(1:2:end),cData(2:2:end));
t2 = toc;

The above method is slower than it has to be. I think the indexing really slows down the copy into cData.

% method 2: read into 2xN array, use complex and indexing
tic
cData = fread(y.fid,[2 numSamp], y.type);
t3 = toc;
tic
cData = complex(cData(1,:),cData(2,:)).';
t4 = toc;

The above method is indeed faster. It still requires a copy.

% method 3: read into 2xN array, use complex muliply
tic
cData = fread(y.fid,[2 numSamp], y.type);
t5 = toc;
tic
cData = cData.' * [1 ;1j];
t6 = toc;

The above is the slowest of them all.

I tried memmap, but it's inappropriate for very large files. I looked into MEX files. It requires that you create a new complex array for every read, but you could probably fill the array efficiently. The problem is that this is worse than an fread to an existing array of the correct size because one has to allocate new memory for every call.

Thanks in advance.

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Chris Gelrlich el 30 de Jun. de 2020

Abrir en MATLAB Online

dbp -

Sorry about being unclear with valid type. I wrote some wrappers around the fread to handle errors and compute number of data samples given data type. In this case f.type is a string, usually 'int16', but after processing I will save the data as 'float64' to a different file.

Mr. Robertson -

I agree, this should be easy to do because it's useful. Perhaps I should mark this answered and request it to MATLAB as a feature.

Mr. Tursa -

If I understand using mex files when you go by the book with the MATLAB API, all output variabls have to be created by something like

plhs[0] = mxCreateDoubleMatrix( (mwSize)rows, (mwSize)cols, mxCOMPLEX);

I presumed (perhaps incorrectly) that space for the matrix is created similar to a malloc(). I thought the array would be copied to the left hand side argument when the mex functio completed and that the array in the function would be freed.

Chris

James Tursa el 30 de Jun. de 2020

Editada: James Tursa el 30 de Jun. de 2020

"... I thought the array would be copied to the left hand side argument when the mex function completed and that the array in the function would be freed ..."

No, that is not what happens.. What happens is a shared data copy of plhs[0] is created and sent back to the caller, not a deep copy. Then when the mex fuction returns and plhs[0] is destroyed, only its header is destroyed, not the data. So no extra data copy is done in this case.

The mex funtion would be pretty simple to write, using all official API functions.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

James Tursa el 30 de Jun. de 2020

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/556855-efficient-ways-to-read-and-write-complex-valued-data#answer_459196

Editada: James Tursa el 1 de Jul. de 2020

A mex routine to accomplish this that doesn't use any hacks can be found here:

https://www.mathworks.com/matlabcentral/fileexchange/77530-freadcomplex

It uses the MATLAB fopen( ) function in the background, so the syntax is the same. It first reads the interleaved complex data file as a real variable with twice the first dimension, then internally converts this to a complex variable with the requested dimensions using a sleight of hand pointer manipulation and no extra data copy. Uses only official API functions and does not require any mxArrary hacks, so should be OK for any MATLAB version R2018a or later.

There should probably be an enhancement request submitted for this, since I think the MATLAB fopen( ) function should have this capability directly without the need for a mex routine.

EDIT

And I just updated this submission to include a related fwritecomplex mex routine for writing interleaved complex variables. It creates a temporary pseudo shared data copy of the interleaved complex variable as a real and then calls the MATLAB fwrite function. All using official API functions so no mxArray hacks involved.

And, again, I think the MATLAB fwrite( ) function should have this capability directly without resorting to mex routines.

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Chris Gelrlich el 8 de Jul. de 2020

I tried out all the ways to read discussed here. As one might expect, Mr Tursa's mex file here smokes them all. Below are 11 trials of reading in about a million complex numbers. The "skip" and "skip rewind" are off the charts (slower). This was done reading an SSD on my laptop.

I've attached .m files that I used for testing and timing. I see a speedup of more than 50 percent for all the data types I tried.

Iniciar sesión para comentar.

Answer 2

dpb el 29 de Jun. de 2020

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/556855-efficient-ways-to-read-and-write-complex-valued-data#answer_458776

Editada: dpb el 30 de Jun. de 2020

Abrir en MATLAB Online

Well, dunno how much faster/slower it might be; didn't try to time it, but trying something different--

Iffen by "interleaved" you mean the data were written something like

c=complex(rand(10,1),rand(10,1));      % make a dummy array for playing around with
fid=fopen('complex.bin','w');
for i=1:10,fwrite(fid,[real(c(i)) imag(c(i))],'double');end
fid=fclose(fid);
fidr=fopen('complex.bin'); fidi=fopen('complex.bin');  % handle for real, complex parts
fseek(fidi,8,'bof');                                   % position file pointer for first complex
complex(fread(fidr,inf,'double',8), fread(fidi,inf,'double',8))
fclose all; clear fidr fidi

Reproduced the original c array here...

Alternatively, you could rewind() the file and then read the second.

Whether the i/o would be quicker than the indexing operation I dunno...

Fortran handles it with reading into a complex variable--you might consider mex using Fortran instead of C

2 comentarios
Mostrar NingunoOcultar Ninguno

James Tursa el 30 de Jun. de 2020

Fortran is simply reading interleaved complex file data into interleaved complex variable, which can effectively be done with fread( ) directly.. This doesn't get the real & complex separated.

Chris Gelrlich el 30 de Jun. de 2020

dbp -

Thanks for the suggestions. I am going to try it out. I'm beginning to get worried that I am being fooled by the operating system being clever. I repeat the reads a number of times, using fseek to set the file pointer back to the start. Perhaps there is a filesystem-to-memory transfer only the first time and my conlcusion that the copy is eating my lunch is wrong.

I'll post again when my test is more sophisticated.

Mr. Tursa -

I read your solution (https://www.mathworks.com/matlabcentral/fileexchange/65842-sharedchild-creates-a-shared-data-copy-of-contiguous-subset) which would work. The problem is maintenance. The code would have to be updated because of the reasons you clearly stated in the comments.

Chris

Iniciar sesión para comentar.

Answer 3

James Tursa el 30 de Jun. de 2020

0
Enlazar

Enlace directo a esta respuesta

https://la.mathworks.com/matlabcentral/answers/556855-efficient-ways-to-read-and-write-complex-valued-data#answer_458812

Editada: James Tursa el 30 de Jun. de 2020

This may not apply to you, but if you have R2018a or later you can just fread( ) into a real variable directly the interleaved data and then use this FEX submission that reinterprets the real variable as a complex variable with half the number of elements using the supplied real2complex mex routine. This returns a shared data copy so it is memory efficient. It uses unofficial hacks to accomplish this because MATLAB does not supply official API functions for this. You need to have a C compiler installed to compile it.

https://www.mathworks.com/matlabcentral/fileexchange/65842-sharedchild-creates-a-shared-data-copy-of-contiguous-subset

Hopefully MATLAB will eventually supply an official function for this and the reverse function, since it is a natural capability that many users will want.

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

James Tursa el 9 de Jul. de 2020

Editada: James Tursa el 9 de Jul. de 2020

Well, your questions on how this works are certainly legitimate and not out of line, so no apology needed. Particularly in light of the fact that these details are not published in MATLAB documentation, so how could you even know?

But getting back to your point of memory allocation, your original supposition is true ... a new memory allocation is in fact required by the mex routine each time the file is read. My points were that (1) this is also true of m-file code so no disadvantage of the mex routine here, and (2) no extra copy is made by the mex routine when returning plhs[0] back to the caller. The freadcomplex performance gains are purely the result of being able to convert a real variable to a complex variable via pointer manipulation without requiring a data copy.

The mex routine posted in the FEX plays by the rules and allocates new memory each time it is called, because it calls MATLAB fread( ) in the background and that is what MATLAB fread( ) will do. There is a way to avoid that memory allocation each time, but you would have to abandon MATLAB fread( ) and write the code using the C fread( ) function, and you would have to modify a MATLAB variable inplace (reading data directly into its data area) which would violate the rules (you could inadvertently write into other variables sharing data with the one you modified). As long as you didn't care about potential side effects (e.g., you never use those other shared variables downstream in your code) you can get away with this. Or you could hold the varialbe inside the mex routine and return a shared data copy of it (which also violates the official rules). It might be interesting to see what kind of speedup this would get. But given your aversion to unofficial methods, I didn't write that.

Chris Gelrlich el 10 de Jul. de 2020

Abrir en MATLAB Online

Thanks for the clarfication. Yes, I am trying to strike a balance between efficiency and maintainable code.

By the way, I ended up writing a wrapper for fopen() to avoid repeating the checks for fopen() failure and to compute the number of samples in these files based on file size and data type. I ended up using a trick of yours:

numel(typecast(x(1),'uint8'))

Thanks for that, too.

Iniciar sesión para comentar.

efficient ways to read and write complex valued data

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (2)

2 comentarios
Mostrar NingunoOcultar Ninguno

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

efficient ways to read and write complex valued data

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (2)

2 comentarios Mostrar NingunoOcultar Ninguno

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Ver también

Categorías

Etiquetas

Productos

Community Treasure Hunt

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos