I have imported a large database using textscan(). Now I have data with 12 variables. Each observation looks like this:
5,573346285,746540138,NA,1341119065,NA,7,0,2,1341111281,"-1,-1,-1,0,-1",-0.8
These are cell data and I would like to convert them in dataset type, but my problem is that the 11th variable is a string that may contain several numbers separated by commas. I cannot use something like this regexp(datacell{1,1}{6,1}, ',\s*', 'split') because it will split the 11th variable in many different parts. Can you please suggest me a code that can make it? Thank you.

2 comentarios

Stephen23
Stephen23 el 21 de Mayo de 2016
Editada: Stephen23 el 21 de Mayo de 2016
@Sebastiano delre: are the number of numbers within the double quotes always the same ? In you example there are five numbers: are there always five ?
Sebastiano delre
Sebastiano delre el 21 de Mayo de 2016
No, actually that can vary. This is exactly what creates my problem...

Iniciar sesión para comentar.

 Respuesta aceptada

Stephen23
Stephen23 el 21 de Mayo de 2016
Editada: Stephen23 el 21 de Mayo de 2016
fmt = [repmat('%f',1,10),'%q','%f'];
opt = {'CollectOutput',true, 'Delimiter',',', 'TreatAsEmpty','NA'};
fid = fopen('test.txt','rt');
C = textscan(fid,fmt,opt{:});
fclose(fid);
D = cellfun(@(s)sscanf(s,'%f,'),C{2},'UniformOutput',false);
This returns all of the numeric values in C{1} and C{3}:
>> C{1}
ans =
5 573346285 746540138 NaN 1341119065 NaN 7 0 2 1341111281
6 573346286 746540139 NaN 1341119066 NaN 8 1 3 1341111282
7 573346287 746540140 NaN 1341119067 NaN 9 0 4 1341111283
and those quoted strings are in C{2}:
>> C{2}
ans =
'-1,-1,-1,0,-1'
'-1,0,-1'
'-1'
The quoted strings are simply converted to numeric using sscanf (no regexp is required):
>> D{:}
ans =
-1
-1
-1
0
-1
ans =
-1
0
-1
ans =
-1
The sample file that I used is attached here (I had to create my own as you did not provide us with a sample file to work with):

6 comentarios

Sebastiano delre
Sebastiano delre el 21 de Mayo de 2016
Thank you very much, this helps me much!
Stephen23
Stephen23 el 23 de Mayo de 2016
Editada: Stephen23 el 23 de Mayo de 2016
@Sebastiano delre: so you accepted my answer one month ago, and just now you unaccepted it. Would you care to explain what changed since once month ago when this answer worked for you? Maybe if you actually explain what you tried and what happened then we can make it work for you. In particular you need to upload a sample file using the paperclip button.
Sebastiano delre
Sebastiano delre el 23 de Mayo de 2016
Thanks again for your suggestion. I understand much more on how textscan works now. However, it does not totally work for me yet. I get this error: "Error using textscan. Invalid file identifier. Use fopen to generate a valid file identifier."
Being more specific on my question, I have a very large database named data.csv. Data look like in the file attached. Thank you very much for your help. Can you please tell me how to import that?
Sebastiano delre
Sebastiano delre el 23 de Mayo de 2016
Editada: Sebastiano delre el 23 de Mayo de 2016
@StephenCobeldick: Sorry, Stephen, did not want to be incorrect. I can accept the answer again. It is just that does not work 100%. I have attached the file. Hope you can help me once more. Best.
Btw, it was not 1 month ago, it was just two days ago, right?
Stephen23
Stephen23 el 23 de Mayo de 2016
Editada: Stephen23 el 23 de Mayo de 2016
@Sebastiano delre: the error you are getting "I get this error: "Error using textscan. Invalid file identifier. Use fopen to generate a valid file identifier." " has nothing to do with my algorithm at all.
That error occurs when MATLAB cannot open the file that you requested, most likely because you are passing a wrong filepath to fopen.
This commonly occurs when beginners:
  • try to access a file in some folder that is not the current directory, but pass only the filename without the filepath.
  • define a filepath to a file that does not exist.
  • spell the filename incorrectly.
The solution is (almost always) to pass the correct filepath. You should make this change and tell me what the error message msg is:
[fid,msg] = fopen('test.txt','rt'); % for your filename and filepath!
You will also find hundreds of threads on this forum that explain this exact error message, if you want to read more information about it.
However you have also changed the file format from what you explained in your question, which will then cause my code to not work. Your question did not mention that the file has a header! You can fix this by adding 'HeaderLines',1 to the textscan options.
Or you could try this code, which generates a structure using those header names, which lets you access the data using the fieldnames:
fmt = [repmat('%f',1,10),'%q','%f'];
opt = {'CollectOutput',true, 'Delimiter',',', 'TreatAsEmpty','NA'};
[fid,msg] = fopen('test.txt','rt');
H = regexp(fgetl(fid),'[^,"]+','match');
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = strrep(H,'.','');
C{1} = num2cell(C{1});
C{3} = num2cell(C{3});
C{2} = cellfun(@(s)sscanf(s,'%f,'),C{2},'UniformOutput',false);
M(2,:) = num2cell(horzcat(C{:}),1);
S = struct(M{:});
and access the data like this:
>> S(9).Sentiment
ans =
-1.3333
>> S(2).Sentiment
ans =
-0.5000
>> S(2).startingtime
ans =
1.3411e+09
PS: Sorry about the date mixup! PPS: This new code was tested on your sample file:
Sebastiano delre
Sebastiano delre el 23 de Mayo de 2016
Yes, I see. Now it works, thanks.

Iniciar sesión para comentar.

Más respuestas (2)

Walter Roberson
Walter Roberson el 21 de Mayo de 2016

0 votos

If you are using one of the more recent versions of textscan then you can use the %q format to read the double-quoted string as a single item.
Azzi Abdelmalek
Azzi Abdelmalek el 21 de Mayo de 2016
Editada: Azzi Abdelmalek el 21 de Mayo de 2016
a='5,573346285,746540138, NA ,1341119065,NA,7,0,2,1341111281,"-1,-1,-1,0,-1",-0.8'
b=regexp(a,'\<".+\>"\,','match');
c=strrep(a,b,'');
data1=regexp(c,'[\s\,]+','split');
data2=regexp(b{1}(2:end-2),'[\s\,]+','split');
data=[data1{:} data2{:}]

Categorías

Más información sobre Characters and Strings en Centro de ayuda y File Exchange.

Preguntada:

el 21 de Mayo de 2016

Comentada:

el 23 de Mayo de 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by