Replace values in matrix (text to value)

Hi,
I have a matrix with 44 columns with the following headings:
id trip time objID status1 status2 status3 ... status40
I used textscan to read the data into one block. The problem is the "status" columns have "NULL" values as their first few values, but the columns are all different, ie. status1 has NULL for its first 5 rows, status2 for its first 3 rows, status3 for its first 10 rows, etc. Textscan has the "headerlines" argument that removes a specified number of rows from the beginning of the matrix, but is there anything I can do to remove a varying amount of rows? To ensure all the columns have the same number of values, I was going to replace all of the NULL values with a 0. Any idea on how to do that?
Thanks for the help.

4 comentarios

Cedric
Cedric el 21 de Oct. de 2013
Editada: Cedric el 21 de Oct. de 2013
Attach the file to your question by clicking on the icon
in the editor, so we can have a look.
Muneer
Muneer el 21 de Oct. de 2013
My m-file? or the data file?
Cedric
Cedric el 21 de Oct. de 2013
Editada: Cedric el 21 de Oct. de 2013
The data file. Or a sample with 20-50 lines if it's too large.
Muneer
Muneer el 21 de Oct. de 2013
Here's an image of the data because to get a better idea. Unfortunately, I can't give you a file because of security issues. But hopefully this helps.
I appreciate the help.

Iniciar sesión para comentar.

 Respuesta aceptada

Cedric
Cedric el 21 de Oct. de 2013
Editada: Cedric el 21 de Oct. de 2013
Here is a small example using the following content
vehicle trip_time A B C
C23432 1234556 NULL NULL NULL
C23432 1234557 1 NULL NULL
C23432 1234558 2 NULL 100
C23432 1234559 3 10 200
C23432 1234560 4 20 300
the question that remains is what you want/need NULL entries to be converted to.
fid = fopen( 'myFile.txt', 'r' ) ;
content = textscan( fid, '%s %f %s %s %s', 'headerlines', 1 ) ;
fclose( fid ) ;
for c = 3 : numel( content ), content{c} = str2double( content{c} ) ; end
vehicle = content{1} ;
data = [content{2:end}] ;
With that you get
>> vehicle
vehicle =
'C23432'
'C23432'
'C23432'
'C23432'
'C23432'
which is a cell array of strings, and a numeric array for the rest of the data, with NaN entries for NULL values (seems appropriate)..
>> data
data =
1234556 NaN NaN NaN
1234557 1 NaN NaN
1234558 2 NaN 100
1234559 3 10 200
1234560 4 20 300
I'd personally work with that, but if you wanted zeros instead of NaNs, you could proceed as follows:
>> data(isnan(data)) = 0
data =
1234556 0 0 0
1234557 1 0 0
1234558 2 0 100
1234559 3 10 200
1234560 4 20 300
It's more handy for e.g. plotting, but you loose the information about NULL entries, because you can't say anymore which were initially NULL and which were true zeros.
Let me know if it's not exactly what you wanted to achieve, and we can refine the answer.

10 comentarios

Muneer
Muneer el 21 de Oct. de 2013
I think this will work. My problem was that the "NULL" values didn't allow the columns of data to be read in as numbers (%n) with textscan, I had to read the whole column in as strings (%s). Changing NULL to NaN seems to take care of that. If I wanted to, could I change the NaN value to a 0, for example?
Thanks for your help
Cedric
Cedric el 21 de Oct. de 2013
Editada: Cedric el 21 de Oct. de 2013
Yes, I updated my answer with an example showing hoe to replace NaNs with zeros, probably right after you saw it. Please have a look at the update.
Muneer
Muneer el 21 de Oct. de 2013
Thanks for the help. I loaded a section of my data in and ran the for loop. After, I called one of the columns (entering "data" to display the matrix was much too large) to look for the NaN values and didn't see them. I called status2. Shouldn't I see NaN in the column of data? It seems to have cut off the first second or two
Muneer
Muneer el 21 de Oct. de 2013
Editada: Muneer el 21 de Oct. de 2013
here are the first 5 columns (time, id, status1, status2, status3). the data should look like the first picture I attached.
Cedric
Cedric el 21 de Oct. de 2013
My solution should be adapted to your case; in my example, column 1 contains characters, so I read it a strings, column 2 contains only numbers so I read it as floats, and the next column contain numbers and NULL entries. I read them as strings as well before post-processing them with STR2DOUBLE. The latter converts to double what can be converted and to NaN the rest. At the end of this process, columns 2 to end are numeric arrays and column 1 is a cell array of strings. I concatenate columns 2 to end into an numeric array named data and I save column 1 into its own cell array named vehicle.
Now if the content is a bit different, the solution should be adapted accordingly.
Is the data file too confidential for sending it to me by email? If not, send it to me and I'll have a look.
Muneer
Muneer el 22 de Oct. de 2013
Sure I can email you. I'll send you a portion of the file and my code. Is the email on your page the right one? It says "remove this" after it.
Muneer
Muneer el 22 de Oct. de 2013
I just experimented with it and found that the headerlines argument with textscan is what is deleting the first few points of data. The actual data has time starting at 0, but when introducing 'Headerlines', '1' with textscan all of the data before time 2046 is deleted. Not sure why that is...
Cedric
Cedric el 22 de Oct. de 2013
Yes, just remove the part ".REMOVETHIS" .. it's a cheap trick to avoid receiving too much spam.
Muneer
Muneer el 22 de Oct. de 2013
Maybe the for loop is doing the headerlines operation multiple times and deleting the top row each time?
Thanks for your help
Cedric
Cedric el 22 de Oct. de 2013
I just replied to your email.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Characters and Strings en Centro de ayuda y File Exchange.

Preguntada:

el 21 de Oct. de 2013

Comentada:

el 22 de Oct. de 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by