Read specific hex data in CSV file

Question

0 votos

I've looked through the posts on StackOverflow and on MATLAB Answers and can't seem to find the answer I am looking for. I have a large CSV file (450 MB) with hex data that looks like this:

63C000CF,6000002F,603000AF,6000C06F,617300EF,6C7C001F,6000009F,0%,63C000CF...

That is a very truncated example, but basically I have approximately 78 different hex values separated by commas, then there will be the '0%', then 78 more hex values. This will continue for a very long time. I've been using textscan like this:

data = textscan(fid, '%s', 1, 'delimiter', '%');
data = textscan(data{1}{1}, '%s', 'delimiter', ',');
data = data{1};
count = size(data);
outstring = ['%', sprintf('\n')];
for idx = 1:count(1)           
    string = data{idx};
    stringSize = size(string);
    if stringSize(2) > 1
        outstring = [outstring, string, sprintf('\n')];
    end
end
fprintf(output_fid, '%s', outstring)

This allowed me to format the csv file in a way to which I could use fgetl() to analyze whether or not I was looking at the data I needed. Because the data repeats itself, I can use fseek() to jump to the next occurrence before calling fgetl() again.

What I need is a way to skip to the ending. I want to just be able to use something like fgetl() but have it only return the first hex value it encounters. I will know how many bytes to shift through the file. Then I need to make sure I can read other hex values. Is what I'm asking possible? My code using textscan above takes far too long on a csv file that is 90 MB let alone 450 MB.

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

dpb el 4 de Jun. de 2014

I know you know what you're after, but we can only go by what is revealed here. Don't be so terse; over-explain rather than under-...

...Each set of hex values represents a label

What's a "set" in this context? A single value or all the values of a given offset relative to the beginning/the flag value? Or is it the entire group between the flag values?

Is "one label" above a single 16-bit hex value or again all of the same offset or the group at a the location of the indicated flag value? Have to have a precise definition of what it really is you're after.

How is/are the one(s) wanted identified?

What is the function of the indicator

Adam Kaas el 4 de Jun. de 2014

Editada: Adam Kaas el 4 de Jun. de 2014

I apologize for not being thorough in my explanation.

A set of hex values represents an 8 character hex value (the values separated by commas), i.e. 63C000CF. One label is one set. We define them by the last two characters in the hex value, i.e. 63C000CF is label CF.

The labels chosen by the user are selected from a list of all available labels. This list is populated into a GUI in a separate function. Using the values from my example that list would be labels CF, 2F, AF, 6F, EF, 1F, and 9F. The user can, just as an example, select labels CF and AF and then I would need to go through the CSV file, find my first CF label and store the data contained, then move to the next CF (which will be a set number of bytes away in the file) and record that data until the end of the file is reached. Then I would repeat the process for the AF label.

If it is relevant, we do have names associated with the labels and don't actually refer to them as label CF. The label number is calculated in a strange way due to the way the data is transmitted, but essentially label CF would be label 363 (change CF to binary, flip it, that is the octal label). The user will know what kind of data is represented by that label.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Cedric el 4 de Jun. de 2014

Editada: Cedric el 4 de Jun. de 2014

Abrir en MATLAB Online

3 votos

NEW solution

Here is a more efficient solution; I am using a 122MB file, so you have an idea about the timing

 % One line for reading the whole file. To perform once only.
 tic ;
 content = fileread( 'adam_1.txt' ) ;
 fprintf( 'Time for reading the file    : %.2fs\n', toc ) ;
 % One line for defining an extraction function. To perform once only.
 extract = @(label) content(bsxfun( @plus, ...
                                    strfind( content, [label,','] ).' - 6, ...
                                    0 : 5 )) ;
 % Then it is one call per label to extract data.
 tic ;
 data = extract( 'CF' ) ;
 fprintf( 'Time for extracting one label: %.2fs\n', toc ) ;

Running this, I obtain

 Time for reading the file    : 0.52s
 Time for extracting one label: 0.62s

FORMER solution

Would the following work for you?

 % Read file content. To do once only.
 content = fileread( 'myFile.txt' ) ;
 % Define regexp-based extraction function. To do once only.
 getByLabel = @(label) regexp( content, sprintf( '\\w{6}(?=%s)', label ), ...
                               'match' ) ;
 % Get all entries for e.g. label 'CF'.
 entries_CF = getByLabel( 'CF' ) ;
 % Get all entries for e.g. label '6F'.
 entries_6F = getByLabel( '6F' ) ;

I am not completely clear on what you need to achieve ultimately; if I had to design a GUI where users can choose a label and get corresponding data, I would process the data much further during the init phase, e.g. by grouping them by label in a cell array. Regexp is not the most efficient approach in this case I guess, but the principle would be..

 labels  = {'CF', '6F', 'AF', ..} ;
 nLabels = numel( labels ) ;
 data    = cell{ 1, nLabels ) ;
 for lId = 1 : nLabels
    data{lId} = getByLabel( labels{lId} ) ;
 end

and then when a user selects 'CF' ..

 lId = strcmpi( label, labels ) ;
 dataForThisLabel = data{lId} ;

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

Adam Kaas el 5 de Jun. de 2014

Thanks Cedric! I've been playing with the regexp and it has been proving to be faster. I'll work on implementing your new solution. I appreciate your help!

Cedric el 5 de Jun. de 2014

My pleasure!

Iniciar sesión para comentar.

Read specific hex data in CSV file

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

Respuesta aceptada

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

Más respuestas (0)

Categorías

Etiquetas

Community Treasure Hunt

Read specific hex data in CSV file

6 comentarios Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

Respuesta aceptada

6 comentarios Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

Más respuestas (0)

Categorías

Etiquetas

Ver también

Community Treasure Hunt

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos