Reading in large binary file with multiple data types (uint8, double, etc.)
12 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Problem: I am trying to ready in a binary data file. The format of the file is a series of data "blocks" that each contain various data types in a repeating pattern. Example: A file contains a series of data "blocks" that are 17 bytes each, and there are 20 entries (for a total of 340 bytes). The first 8 bytes (64 bits) of a given "block" represent a double, the next byte is an unsigned 8-bit integer, the next 4 bytes represent a single, and the final 4 bytes are a 32-bit signed integer. This pattern is then repeated for all 20 entries.
I currently am using two nested "for" loops to read in these data types one at a time with the "fread" function. This works, but is very slow, especially for large files. What I'm attempting to do now is read in all the data at once as a series of uint8 values (which is much faster), then reshape it to a matrix (in this example, 20x17) and convert the values to the data type that I desire (e.g., take the first 8 columns of each row (20x8) and convert them into a double (20x1)).
I don't know of an easy way to do this. I could write the data to a temporary binary file and then read it back in as a new data type and remove the file, but I would rather not bother with the file I/O, there should be a way to do it in the workspace. The only other option I can think of is to convert the uint8 values into a binary string and manually reconstruct the new data type from the bits, but if there was a simpler built-in (and faster, and less error-prone) way to accomplish this I would prefer it.
Any suggestions are appreciated, thanks.
0 comentarios
Respuestas (1)
Geoff Hayes
el 29 de Mzo. de 2016
Editada: Geoff Hayes
el 29 de Mzo. de 2016
Adam - you may be able to use memmapfile to read the data from your file. According to its description, Memory-mapping is a mechanism that maps a portion of a file, or an entire file, on disk to a range of memory addresses within the MATLAB® address space. Then, MATLAB can access files on disk in the same way it accesses dynamic memory, accelerating file reading and writing. Memory-mapping allows you to work with data in a file as if it were a MATLAB array.
For example, suppose we use the following code to create 25 blocks given your format for each
function createBinaryFile
fid = fopen('myBinaryData.dat','wb');
if fid
numBlocks = 25;
for k=1:numBlocks
% write the double
fwrite(fid,pi*k,'double'); % varA of block k will be pi*k
% write the unsigned integer
fwrite(fid,k,'uint8'); % varB of block k will be k
% write the single
fwrite(fid,pi/k,'single'); % varC of block k will be pi/k
% write the signed integer
fwrite(fid,k*k,'int32'); % varD of block k will be k*k
end
fclose(fid);
end
We can now use memmapfile to create the memory map to this file as
m = memmapfile('myBinaryData.dat',...
'Format',{'double',[1,1],'varA';...
'uint8', [1,1],'varB';...
'single',[1,1],'varC';...
'int32', [1,1],'varD'},'Repeat',25);
Note how we specify the format to be exactly how each block is written. We can then access any block as
m.Data(1)
ans =
varA: 3.1416
varB: 1
varC: 3.1416
varD: 1
or
m.Data(2)
ans =
varA: 6.2832
varB: 2
varC: 1.5708
varD: 4
If you don't know the number of blocks, then you can specify Inf in place of 25.
Try the above and see what happens!
0 comentarios
Ver también
Categorías
Más información sobre Large Files and Big Data en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!