Why am I getting an invalid file identifier error when using "parfor" but the same function works fine (albeit slow) with "for"?
    10 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
I am trying to run the code below using a file of binary data. I am trying to speed up the file reading using a parfor loop for reading the data points but keep receiving this error on the line with the parfor loop. This code seems to work (never been run to completion with a for loop because it takes too long) if I use just a for loop so it seems like the error is specific to the parfor function. Any help would be greatly appreciated!
Error I receive:
Invalid file identifier.  Use fopen to generate a valid file identifier.
Code:
%% Creating File ID and finding File attributes
fid = fopen(full_file_path, 'r', 'l');
file_info = dir(full_file_path);
%% Reading and making the Header
obj.header.identifier = fread(fid, 6, 'uchar');
obj.header.return_mode = dec2hex(fread(fid, 1, 'uint8'));
obj.header.boresighting_quaternion = fread(fid, 4, 'double');
obj.header.scanner_linear_offset = fread(fid, 3, 'double');
obj.header.resepi_orientation_angles = fread(fid, 3, 'float32');
obj.header.reserved_1 = fread(fid, 1, 'uint64');
obj.header.azimuth_offset = fread(fid, 32, 'float32');
obj.header.elevation_offset = fread(fid, 32, 'float32');
obj.header.device_id = fread(fid, 1, 'uint32');
obj.header.extra_parameters = fread(fid, 32, 'float32');
obj.header.reserved_2 = fread(fid, 40, 'uchar');
%% Data Parsing
switch obj.header.return_mode
    case '13'
        data_packet_size = 668; %bytes
        number_of_points = (file_info.bytes - 512) / data_packet_size;
        microseconds_since_gps_epoch = zeros(number_of_points, 1);
        theta_angle = zeros(number_of_points, 1);
        phi_angle = zeros(number_of_points, 1);
        range_1 = zeros(number_of_points, 1);
        reflectivity_1 = zeros(number_of_points, 1);
        tag_1 = zeros(number_of_points, 1);
        range_2 = zeros(number_of_points, 1);
        reflectivity_2 = zeros(number_of_points, 1);
        tag_2 = zeros(number_of_points, 1);
        range_3 = zeros(number_of_points, 1);
        reflectivity_3 = zeros(number_of_points, 1);
        tag_3 = zeros(number_of_points, 1);
        parfor i = 1:number_of_points %Line I receive error on
            if i == any(1:30:1000000000)
                microseconds_since_gps_epoch(i) = fread(fid, 1, 'uint64');
            end
            theta_angle(i) = fread(fid, 1, 'uint16');
            phi_angle(i) = fread(fid, 1, 'uint16');
            range_1(i) = fread(fid, 1, 'uint32');
            reflectivity_1(i) = fread(fid, 1, 'uint8');
            tag_1(i) = fread(fid, 1, 'uint8');
            range_2(i) = fread(fid, 1, 'uint32');
            reflectivity_2(i) = fread(fid, 1, 'uint8');
            tag_2(i) = fread(fid, 1, 'uint8');
            range_3(i) = fread(fid, 1, 'uint32');
            reflectivity_3(i) = fread(fid, 1, 'uint8');
            tag_3(i) = fread(fid, 1, 'uint8');
        end
        obj.microseconds_since_gps_epoch = microseconds_since_gps_epoch;
        obj.theta_angle = theta_angle;
        obj.phi_angle = phi_angle;
        obj.range_1 = range_1;
        obj.reflectivity_1 = reflectivity_1;
        obj.tag_1 = tag_1;
        obj.range_2 = range_2;
        obj.reflectivity_2 = reflectivity_2;
        obj.tag_2 = tag_2;
        obj.range_3 = range_3;
        obj.reflectivity_3 = reflectivity_3;
        obj.tag_3 = tag_3;
end
0 comentarios
Respuestas (1)
  Joseph Cheng
      
 el 17 de Jun. de 2021
        
      Editada: Joseph Cheng
      
 el 17 de Jun. de 2021
  
      From the looks of it you've opened a file then trying to use parfor to grab items out of the opened file.  However thats where things gets problematic as in the parfor loop there are too many "hands" trying to grab things from a single file possibly at the same time.  Especially with how you've written the loop you've not also garunteed things to be in parsed correctly even if the parfor worked.  
This is because in a single iteration of the for loop you were reading the file sequentially and if parallalized you're letting each processor grab at the file at the same time.  You're better off reading in the whole file then parsing things out to the variables, or checking if you're data fits what is described here https://www.mathworks.com/matlabcentral/answers/276010-reading-in-large-binary-file-with-multiple-data-types-uint8-double-etc and use memmapfile().
2 comentarios
  Lance
 el 10 de En. de 2022
				It never works for me as well. 
my code is as simple as this: 
filename = fullfile('temp','parfor_test.txt');
fid = fopen(filename,'W');
parfor i = 1:50
    fprintf(fid,[num2str(i),'\n']);
end
fclose(fid);
Is there anyway parfor can work with the fprint and fread ?
  Walter Roberson
      
      
 el 10 de En. de 2022
				filename = fullfile(tempdir(),'parfor_test.txt');
fclose(fopen(filename, 'w'));   %empty file
spmd
    fids = fopen(filename,'a');
end
fid = [fids{:}];
parfor i = 1:50
    FID = fid(getCurrentTask().ID);
    fprintf(FID,'iter #%d FID %d\n', i, FID);
end
parfevalOnAll(@() close('all'), 0);
In my tests, the fid came out the same on all workers -- but one should not assume that will always be the case.
Notice that the file must be opened in append mode, and most be opened on each worker individually.
Each individual fprintf() is guaranteed to be written out "atomically" (at least up to 8192 bytes), not mixed up with the output of any other process.
If you were to open a file for reading inside a worker, then each fopen() has its own independent position. reading information in one process does not "consume" it for the other processes. 
If you wanted to have each different worker process a line (or block) of information from the same file, then I would recommend that you have a single worker that reads a line (or block) and then uses parfeval() to queue processing of the content.
If you have a binary file, then each worker can fseek() to a different location.
Caution: the "sweet spot" for performance for spinning hard drives is usually two processes per hard drive, and only one hard drive per controller (with slower drives and faster controllers, having more per controller can be okay.)
If you are looking for high performance and you are not using a "server", then my understanding is that current optimum performance for spinning media is with using a direct-attached Thundebolt 4 connector to a RAID controller that is striping between at least two drives, and using fast drives. But a high quality SSD drive can do significally better (provided you have it attached to a good quality controller... which can start to involve Thunderbolt 4 based enclosures.)
Ver también
Categorías
				Más información sobre Parallel Computing en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



