How to read an array of set number of characters from a binary file while skipping bytes in between.
10 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Andre Aroyan
el 9 de Dic. de 2024
Comentada: Walter Roberson
el 10 de Dic. de 2024
I'm trying to read a binary file, and I'm wondering if there is a better way to read the characters. A portion of the binary file repeats the same sequence, x number of times
32 characters, integer, integer, integer, integer
If I had just 32 character repeated x times, I can use:
names = convertCharsToStrings(fread(fileID, [ 32 x], '*char'));
I would like to do something like this instead since there are bytes in between:
names = convertCharsToStrings(fread(fileID, [ 32 x], '*char'), 32 + 8*4);
That it not working the way that I hoped. I assume it is reading one byte, then skipping 32 + 8*4 before reading the next byte.
I have a workaround where I read the first character of each of the x sequences, then read the 2nd character, and so on.
names( 1:x, 1:32) = ' ';
for i = 1:32
names(:, i) = convertCharsToStrings(fread(fileID, [ 1 x ], '*char', 31 + 8 *4));
fseek(fileID, -8*(x*4)-32*x + 1, 0);
end
This accomplishes what I need, but is there an easier, better, or faster way to do this?
1 comentario
Walter Roberson
el 9 de Dic. de 2024
You can reduce the load a bit if you read in a uint64 and typecast it to uint8 and char() that. You would only need to loop 4 times instead of 32 and the I/O would be more efficient.
Note that you might need to fopen() with 'ieee-be' to get the right byte order when you do the above.
Respuesta aceptada
Arjun
el 10 de Dic. de 2024
Hi @Andre,
I see that you are wondering if there is an efficient way to pull out strings from a binary file which are mixed with integers in a certain pattern.
In this case you have pattern such that there is a string which is 32 bytes followed by 4 integers. You can open the file in binary read mode and calculate the size of each sequence (32 bytes for the string and 16 bytes for the integers). By iterating over the number of sequences, you can read each complete block of data at once using ‘fread’. The first 32 bytes of each block can be extracted and converted into a character string, which you can store in a pre-allocated string array. After processing all sequences, you can close the file and display the extracted names.
Using the above approach you can reduce the number of input/output operations by reading entire sequences at once, which is more efficient than reading each part separately. This approach takes advantage of MATLAB's ability to handle arrays efficiently, improving performance. By pre-allocating the ‘names’ array, you can avoid the overhead of dynamic resizing during the loop.
Kindly refer to the code below for better understanding:
% Open the file for reading
fileID = fopen('dummyfile.bin', 'rb');
% Determine the size of each complete sequence, assuming 4 bytes for integer
sequenceSize = 32 + 4 * 4;
% Preallocate the array for names
names = strings(x, 1);
for i = 1:x
% Read one sequence (32 characters + 4 integers)
data = fread(fileID, sequenceSize, '*uint8');
% Extract the 32 characters
nameChars = char(data(1:32))';
% Convert the characters to a string
names(i) = convertCharsToStrings(nameChars);
end
% Close the file
fclose(fileID);
% Display the names
disp(names);
I hope this helps!
2 comentarios
Walter Roberson
el 10 de Dic. de 2024
Because typecast does not accept arrays of data, you need to loop it one way or another (possibly using arrayfun()). The alternative is that you could calculate the values:
d64 = uint64(data);
integerArray1 = d64(:,33) * 2^56 + d64(:,34) * 2^48 + d64(:,35) * 2^40 + etc
integerArray1 = typecast(integerArray1, 'int64');
Más respuestas (0)
Ver también
Categorías
Más información sobre Low-Level File I/O en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!