Help on parallelizing code - Nested files
Mostrar comentarios más antiguos
Hello friends!
I am working on parsing a large library (tens of thousands) of XML files. My intention is to parse through all of them and save the information I need in a single variable for post processing (and to save it in another format that isn't as nested).
The files are not custom made by me or my team, and they have a nested structure that is rather convoluted. In pseudo code; my current parfor loop iterator looks like this:
data_out = ["colum_header_1", "colum_header_2", "colum_header_n"]
parfor z = 1:numFiles
file = xml2struct(fullFileNames{z});
* for i = 1:length(file.logfile.scan)
%Header Info
var_1 = convertCharstoStrings(file.logfile.Attributes.var1)
var_2 = convertCharstoStrings(file.logfile.Attributes.var2)
var_n = convertCharstoStrings(file.logfile.Attributes.var3)
%now sometimes the scane will be singular and sometimes there are multiple, so I have an if case to filter that out and prevent an error of indexing. Ommited, and showing only the multiple scan case.
first_section_file = file.logfile.scan{1,i}
try
** for j = 1:length(first_section_file)
%Here I need some data from let's say. firstsection.info_1.Attributes Additionally there is another structure in this point, let's say info_2 where I also have to get data out. However, as with scan, it can have a singular reading, or multiple readings. As such I have an if else
second_var_1 = firstsection.info_1.Attributes.var1
second_var_2 = firstsection.info_1.Attributes.var2
second_var_3 = firstsection.info_1.Attributes.var3
if reading == 1
third_var_1 = convertCharstoStrings(first_section_file.info_2.Attributes.var1)
third_var_2 = convertCharstoStrings(first_section_file.info_2.Attributes.var2)
third_var_n = convertCharstoStrings(first_section_file.info_2.Attributes.var3)
else
%same code as I would be just getting the information out from the given reading and then iterating over it.
end
data_out = variables
end %Here I end the ** for loop
catch
fprintf(No data)
end % End of the try
end %Here I end the * for loop. end of scane
end %End of code.
My intetion is making the first loop the parfor loop, that way I will be using different workers per file. The problem I have is setting the "data_out " variable appropriately so that the data I need to be saved on it is saved. As the code stands, I don't have a problem with the parfor, but rather I think it's a "race condition" of sorts where given that each loop resets the value, it never saves anything.
I tried setting it up as data_out = [data_out ; variables], but that results in an error from the parfor and using cat doesn't work either. I tried also setting the loops as a separate function, but that give more problems than solutions (granted I could have made a couple of mistakes trying it). Another issue is that the iteration indices are not a good way of saving data in the data_out variable, since they will be reset every iteration as I need to iterate over all the scans. That would overwrite the values already existing in, and the z index would be very slow-moving as it is the file counter.
Maybe someone has dealt with an issue like this and can shed some light? In case anyone has heard of it before, I am working with OpenBMap files. I have a working for loop iterator, but as it stands, the duration of each iteration just grows as more data is saved into the data_out array (as one would expect clearly). I could preallocate, but I don't really know the amount of datapoints that there will be after all the files are read.
Oh, as a side note, I made a separate parfor loop to convert all the XML files into structures in another test I did given that the profiler pointed to that particular function being the bottle-neck of the code, but the performance gain from doing so in the long run, wasn't as great as expect.
Respuesta aceptada
Más respuestas (0)
Categorías
Más información sobre Loops and Conditional Statements en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!